Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (12): 3686-3691.DOI: 10.11772/j.issn.1001-9081.2021101749

• Artificial intelligence • Previous Articles    

Single direction projected Transformer method for aliasing text detection

Zhida FENG1,2(), Li CHEN1,2   

  1. 1.School of Computer Science and Technology,Wuhan University of Science and Technology,Wuhan Hubei 430065,China
    2.Hubei Province Key Laboratory of Intelligent Information Processing and Real?time Industrial System (Wuhan University of Science and Technology),Wuhan Hubei 430065,China
  • Received:2021-10-12 Revised:2022-01-24 Accepted:2022-01-24 Online:2022-04-08 Published:2022-12-10
  • Contact: Zhida FENG
  • About author:CHEN Li, born in 1977, Ph. D., professor. His research interests include computer vision, image processing.
  • Supported by:
    National Natural Science Foundation of China(61773297)

面向混叠文字检测的单向投影Transformer方法

冯智达1,2(), 陈黎1,2   

  1. 1.武汉科技大学 计算机科学与技术学院, 武汉 430065
    2.智能信息处理与实时工业系统湖北省重点实验室(武汉科技大学), 武汉 430065
  • 通讯作者: 冯智达
  • 作者简介:陈黎(1977—),男,湖北武汉人,教授,博士,主要研究方向:计算机视觉、图像处理。
  • 基金资助:
    国家自然科学基金资助项目(61773297)

Abstract:

To address the performance degradation of segmentation-based text detection methods in aliasing text scenes, a Single Direction Projected Transformer (SDPT) was proposed for aliasing text detection. Firstly, multi-scale features were extracted and fused by using deep Residual Network (ResNet) and Feature Pyramid Network (FPN). Then, the feature map was projected into a vector sequence by using horizontal projection and was fed into the Transformer module to model, thereby mining the relationship between the lines of text. Finally, joint optimization was performed using multiple objectives. Extensive experiments were conducted on the synthetic dataset BDD-SynText and the real dataset RealText. The results show that the proposed SDPT achieves optimal effect for text detection with high aliasing level, and improves F1-Score (IoU75) by at least 21. 36 percentage points on BDD-SynText and 18.11 percentage points on RealText compared with the state-of-the-art text detection algorithms such as Progressive Scale Expansion Network (PSENet) under the same backbone network (ResNet50), verifying the important role of the proposed method for performance improvement in aliasing text detection.

Key words: computer vision, deep learning, scene text detection, aliasing text, projection, Transformer algorithm

摘要:

针对基于分割的文字检测方法在混叠文字场景下性能下降的问题,提出了单向投影Transformer (SDPT)用于混叠文本检测。首先,使用深度残差网络(ResNet)和特征金字塔网络(FPN)提取并融合多尺度特征;然后,利用水平投影将特征图投影成向量序列,并送入Transformer模块进行建模,以挖掘文本行与行之间的关系;最后,使用多目标来进行联合优化。在合成数据集BDD-SynText和真实数据集RealText上进行了大量实验,结果表明,所提SDPT在高混叠度的文字检测下取得了最优的效果,而与PSENet等文本检测算法在相同骨干网络(ResNet50)条件下相比,在BDD-SynText上F1-Score(IoU75)至少提高了21.36个百分点,在RealText上的F1-Score (IoU75)至少提高了18.11个百分点,验证了所提方法对于混叠文字检测性能改善的重要作用。

关键词: 计算机视觉, 深度学习, 场景文字检测, 混叠文字, 投影, Transformer算法

CLC Number: