《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (5): 1416-1421.DOI: 10.11772/j.issn.1001-9081.2022040520

• 人工智能 • 上一篇    

基于字体字符属性引导的文本图像编辑方法

陈靖超1,2, 徐树公1(), 丁友东2   

  1. 1.上海大学 通信与信息工程学院,上海 200444
    2.上海大学 上海电影学院,上海 200072
  • 收稿日期:2022-04-15 修回日期:2022-06-09 接受日期:2022-06-13 发布日期:2022-07-01 出版日期:2023-05-10
  • 通讯作者: 徐树公
  • 作者简介:陈靖超(1997—),男,上海人,硕士研究生,主要研究方向:文本编辑、字体识别
    徐树公(1969—),男,湖北襄阳人,教授,博士,主要研究方向:无线通信、模式识别 shugong@shu.edu.cn
    丁友东(1967—),男,福建上杭人,教授,博士,主要研究方向:计算机图形学、多媒体展示。

Text image editing method based on font and character attribute guidance

Jingchao CHEN1,2, Shugong XU1(), Youdong DING2   

  1. 1.School of Communication and Information Engineering,Shanghai University,Shanghai 200444,China
    2.Shanghai Film Academy,Shanghai University,Shanghai 200072,China
  • Received:2022-04-15 Revised:2022-06-09 Accepted:2022-06-13 Online:2022-07-01 Published:2023-05-10
  • Contact: Shugong XU
  • About author:CHEN Jingchao, born in 1997, M. S. candidate. His research interests include text editing, font recognition.
    XU Shugong, born in 1969, Ph. D., professor. His research interests include wireless communication, pattern recognition.
    DING Youdong, born in 1967, Ph. D., professor. His research interests include computer graphics, multimedia display.

摘要:

针对文本图像编辑任务中编辑前后文字风格样式不一致和生成的新文本可读性不足的问题,提出一种基于字体字符属性引导的文本图像编辑方法。首先,通过字体属性分类器结合字体分类、感知和纹理损失引导文本前景风格样式的生成方向,提升编辑前后的文字风格样式一致性;其次,通过字符属性分类器结合字符分类损失引导文字字形的准确生成,减小文本伪影与生成误差,并提升生成的新文本的可读性;最后,通过端到端微调的训练策略为整个分阶段编辑模型精炼生成结果。对比实验中,所提方法的峰值信噪比(PSNR)、结构相似度(SSIM)分别达到了25.48 dB、0.842,相较于SRNet (Style Retention Network)和SwapText分别提高了2.57 dB、0.055和2.11 dB、0.046;均方误差(MSE)为0.004 3,相较于SRNet和SwapText分别降低了0.003 1和0.002 4。实验结果表明,所提方法能有效提升文本图像编辑的生成效果。

关键词: 文本图像编辑, 字符识别, 字体识别, 多任务训练, 属性引导

Abstract:

Aiming at the problems of inconsistent text style before and after editing and insufficient readability of the generated new text in text image editing tasks, a text image editing method based on the guidance of font and character attributes was proposed. Firstly, the generation direction of text foreground style was guided by the font attribute classifier combined with font classification, perception and texture losses to improve the consistency of text style before and after editing. Secondly, the accurate generation of text glyphs was guided by the character attribute classifier combined with the character classification loss to reduce text artifacts and generation errors, and improve the readability of generated new text. Finally, the end-to-end fine-tuned training strategy was used to refine the generated results for the entire staged editing model. In the comparison experiments with SRNet (Style Retention Network) and SwapText, the proposed method achieves PSNR (Peak Signal-to-Noise Ratio) and SSIM (Structural SIMilarity) of 25.48 dB and 0.842, which are 2.57 dB and 0.055 higher than those of SRNet and 2.11 dB and 0.046 higher than those of SwapText, respectively; the Mean Square Error (MSE) is 0.004 3, which is 0.003 1 and 0.024 lower than that of SRNet and SwapText, respectively. Experimental results show that the proposed method can effectively improve the generation effect of text image editing.

Key words: text image editing, character recognition, font recognition, multi-task training, attribute guidance

中图分类号: