Abstract: We present a novel structure-aware loss function for text image super-resolution to improve the recognition accuracy of text recognizers in natural scenes. Text image super-resolution is a particular case of general image super-resolution, where our primary goal is to improve the readability of characters in a low-resolution image by increasing the resolution of the text image. In this scenario, general loss functions usually used in previous super-resolution models are insufficient to learn character shapes precisely and stably as it often leads to blurring and breaking of the shapes. In this paper, we propose a skeleton loss for training text super-resolution networks. Skeleton loss enables the networks to generate more readable characters by considering the detailed structural formation of character skeletons, in the optimization process. The key idea of the skeleton loss is to measure the differences between two types of character skeletons, where one is obtained from a high-resolution image and another is from the super-resolved image generated from a given low-resolution image. To implement this idea in an end-to-end form, we introduce a skeletonization network that can generate skeletons from an input text image. Quantitative analysis shows that our method outperforms existing super-resolution models with modern text recognizers in terms of recognition accuracy. Furthermore, our experiments show that our skeleton loss can boost generating readable text images of existing super-resolution networks without modifying their structures.