基于文本语义指导的自然场景文本图像超分辨方法 A Scene Text Image Super-Resolution Method Guided by Text Semantics in Wild期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于文本语义指导的自然场景文本图像超分辨方法

引用本文：	习晨晨,何昕,孟雅蕾,张凯兵. 基于文本语义指导的自然场景文本图像超分辨方法[J]. 空军工程大学学报(自然科学版), 2023, 24(6): 96-103

作者姓名：	习晨晨何昕孟雅蕾张凯兵

作者单位：	1.西安工程大学电子信息学院，西安，710048；2.西安工程大学计算机科学学院，西安，710048

基金项目：	国家自然科学基金(61971339)

摘要：	在自然场景文本图像超分辨中，针对先验信息利用不准确、不充分以及文本边缘恢复不完整的问题，提出了一种基于文本语义指导的自然场景文本图像超分辨方法。该网络结构由超分辨重建模块和文本语义感知模块组成。为进一步提高超分辨网络的表达能力，提出使用循环十字交叉注意力，捕获全局上下文信息，使得模型在训练的过程中更加关注文本区域，同时，提出软边缘损失、梯度损失对重建过程进行约束，生成具有锐利边缘的超分辨结果。采用公开的自然场景文本图像超分辨数据集TextZoom对提出模型的性能进行验证，与8种主流深度网络模型进行了对比,结果表明：该模型在3个不同识别器下的平均识别率相比TSRN分别提升了2.06%、1.80%和2.89%，在PSNR和SSIM指标上也具有一定的优势。
关键词：	场景文本图像超分辨；文本语义；注意力机制；软边缘损失；梯度损失
A Scene Text Image Super-Resolution Method Guided by Text Semantics in Wild

XI Chenchen,HE Xin,MENG Yalei,ZHANG Kaibing. A Scene Text Image Super-Resolution Method Guided by Text Semantics in Wild[J]. Journal of Air Force Engineering University(Natural Science Edition), 2023, 24(6): 96-103

Authors:	XI Chenchen HE Xin MENG Yalei ZHANG Kaibing

Abstract:	Aimed at the problems that in scene text image super-resolution, prior information is inaccurate and insufficient in utilization and text edge is incomplete in recovery, a scene text image super-resolution method guided by text semantics is proposed. This network structure is composed of a super-resolution reconstruction module and a text semantic-aware module. To further improve the expression ability of the super-resolution network, a recurrent crisscross attention mechanism is used to capture global contextual information, making the model pay more attention to the text region during training. And simultaneously, in order to generate sharp edges, a soft-edge loss and a gradient loss are proposed to constrain the reconstruction process. The performance of the proposed model is verified on the public scene text image super-resolution dataset TextZoom with eight mainstream deep network models. Compared with TSRN, the average recognition accuracy of the proposed model is promoted to 2.06%, 1.80%, and 2.89% by three different recognizers respectively, and the proposed model also has advantages in PSNR and SSIM indicators.

Keywords:	scene text image super-resolution text semantic attention mechanism soft-edge loss gradient loss

	点击此处可从《空军工程大学学报(自然科学版)》浏览原始摘要信息
	点击此处可从《空军工程大学学报(自然科学版)》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏