基于字体字符属性引导的文本图像编辑方法

doi:10.11772/j.issn.1001-9081.2022040520

《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (5): 1416-1421.DOI: 10.11772/j.issn.1001-9081.2022040520

所属专题：人工智能

基于字体字符属性引导的文本图像编辑方法

陈靖超¹^,², 徐树公¹(), 丁友东²

^1.上海大学通信与信息工程学院，上海 200444
^2.上海大学上海电影学院，上海 200072

收稿日期:2022-04-15 修回日期:2022-06-09 接受日期:2022-06-13 发布日期:2022-07-01 出版日期:2023-05-10
通讯作者: 徐树公
作者简介:陈靖超（1997—），男，上海人，硕士研究生，主要研究方向：文本编辑、字体识别
徐树公（1969—），男，湖北襄阳人，教授，博士，主要研究方向：无线通信、模式识别 shugong@shu.edu.cn
丁友东（1967—），男，福建上杭人，教授，博士，主要研究方向：计算机图形学、多媒体展示。

Text image editing method based on font and character attribute guidance

Jingchao CHEN¹^,², Shugong XU¹(), Youdong DING²

^1.School of Communication and Information Engineering，Shanghai University，Shanghai 200444，China
^2.Shanghai Film Academy，Shanghai University，Shanghai 200072，China

Received:2022-04-15 Revised:2022-06-09 Accepted:2022-06-13 Online:2022-07-01 Published:2023-05-10
Contact: Shugong XU
About author:CHEN Jingchao， born in 1997， M. S. candidate. His research interests include text editing， font recognition.
XU Shugong， born in 1969， Ph. D.， professor. His research interests include wireless communication， pattern recognition.
DING Youdong， born in 1967， Ph. D.， professor. His research interests include computer graphics， multimedia display.

摘要/Abstract

摘要：

针对文本图像编辑任务中编辑前后文字风格样式不一致和生成的新文本可读性不足的问题，提出一种基于字体字符属性引导的文本图像编辑方法。首先，通过字体属性分类器结合字体分类、感知和纹理损失引导文本前景风格样式的生成方向，提升编辑前后的文字风格样式一致性；其次，通过字符属性分类器结合字符分类损失引导文字字形的准确生成，减小文本伪影与生成误差，并提升生成的新文本的可读性；最后，通过端到端微调的训练策略为整个分阶段编辑模型精炼生成结果。对比实验中，所提方法的峰值信噪比（PSNR）、结构相似度（SSIM）分别达到了25.48 dB、0.842，相较于SRNet （Style Retention Network）和SwapText分别提高了2.57 dB、0.055和2.11 dB、0.046；均方误差（MSE）为0.004 3，相较于SRNet和SwapText分别降低了0.003 1和0.002 4。实验结果表明，所提方法能有效提升文本图像编辑的生成效果。

关键词: 文本图像编辑, 字符识别, 字体识别, 多任务训练, 属性引导

Abstract:

Aiming at the problems of inconsistent text style before and after editing and insufficient readability of the generated new text in text image editing tasks， a text image editing method based on the guidance of font and character attributes was proposed. Firstly， the generation direction of text foreground style was guided by the font attribute classifier combined with font classification， perception and texture losses to improve the consistency of text style before and after editing. Secondly， the accurate generation of text glyphs was guided by the character attribute classifier combined with the character classification loss to reduce text artifacts and generation errors， and improve the readability of generated new text. Finally， the end-to-end fine-tuned training strategy was used to refine the generated results for the entire staged editing model. In the comparison experiments with SRNet （Style Retention Network） and SwapText， the proposed method achieves PSNR （Peak Signal-to-Noise Ratio） and SSIM （Structural SIMilarity） of 25.48 dB and 0.842， which are 2.57 dB and 0.055 higher than those of SRNet and 2.11 dB and 0.046 higher than those of SwapText， respectively； the Mean Square Error （MSE） is 0.004 3， which is 0.003 1 and 0.024 lower than that of SRNet and SwapText， respectively. Experimental results show that the proposed method can effectively improve the generation effect of text image editing.

Key words: text image editing, character recognition, font recognition, multi-task training, attribute guidance

中图分类号:

TP183

陈靖超, 徐树公, 丁友东. 基于字体字符属性引导的文本图像编辑方法[J]. 计算机应用, 2023, 43(5): 1416-1421.

Jingchao CHEN, Shugong XU, Youdong DING. Text image editing method based on font and character attribute guidance[J]. Journal of Computer Applications, 2023, 43(5): 1416-1421.

图/表 11

表1 编辑结果中各区域的PSNR和SSIM结果

Tab. 1 PSNR and SSIM of each area of edited results

区域	PSNR/dB	SSIM
文本区域	17.30	0.70
背景区域	32.10	0.95
整体区域	22.91	0.79

图1 网络的输入图像

Fig. 1 Input images of network

图2 文本编辑网络框架

Fig. 2 Text editing network architecture

图3 前景变换网络输出可视化

Fig. 3 Visualization of foreground transformation network

图4 擦除区域填充可视化

Fig. 4 Filling visualization of erased region

图5 背景修复网络输出可视化

Fig. 5 Output visualization of background inpainting network

图6 前背景融合网络输出可视化

Fig. 6 Output visualization of foreground and background fusion network

图7 消融实验的可视化结果

Fig. 7 Visualization results of ablation study

表2 消融实验的量化评估结果

Tab. 2 Quantitative evaluation results of ablation study

SRNet	字体分类器	字符分类器	端到端微调	PSNR/dB		SSIM		MSE
SRNet	字体分类器	字符分类器	端到端微调	结果	Δ	结果	Δ	结果	Δ
√	×	×	×	22.91	—	0.787	—	0.007 4	—
√	√	×	×	23.93	1.02	0.813	0.026	0.006 1	-0.001 3
√	√	√	×	24.45	0.52	0.827	0.014	0.005 3	-0.000 8
√	√	√	√	25.48	1.03	0.842	0.015	0.004 3	-0.001 0

表3 对比实验量化评估结果

Tab. 3 Quantitative evaluation results of comparison experiments

方法	PSNR/dB	SSIM	MSE
SRNet	22.91	0.787	0.007 4
SwapText	23.37	0.796	0.006 7
本文方法	25.48	0.842	0.004 3

图8 自然场景文本图像的可视化结果

Fig. 8 Visualization results of text images in nature scenes

参考文献 28

1	ZHOU X Y， YAO C， WEN H， et al. EAST： an efficient and accurate scene text detector［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 2642-2651. 10.1109/cvpr.2017.283
2	LI Y， WU Z， ZHAO S， et al. PSENet： psoriasis severity evaluation network［C］// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2020： 800-807. 10.1609/aaai.v34i01.5424
3	WANG W H， XIE E Z， LI X， et al. PAN++： towards efficient and accurate end-to-end spotting of arbitrarily-shaped text［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2022， 44（9）： 5349-5367.
4	LIAO M H， WAN Z Y， YAO C， et al. Real-time scene text detection with differentiable binarization［C］// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2020： 11474-11481. 10.1609/aaai.v34i07.6812
5	师广琛，巫义锐. 像素聚合和特征增强的任意形状场景文本检测［J］. 中国图象图形学报， 2021， 26（7）：1614-1624. 10.11834/jig.200522
	SHI G C， WU Y R. Arbitrary shape scene-text detection based on pixel aggregation and feature enhancement［J］. Journal of Image and Graphics， 2021， 26（7）： 1614-1624. 10.11834/jig.200522
6	SHI B G， BAI X， YAO C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2017， 39（11）： 2298-2304. 10.1109/tpami.2016.2646371
7	WANG T W， ZHU Y Z， JIN L W， et al. Decoupled attention network for text recognition［C］// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2020： 12216-12224. 10.1609/aaai.v34i07.6903
8	LI H， WANG P， SHEN C H， et al. Show， attend and read： a simple and strong baseline for irregular text recognition［C］// Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2019： 8610-8617. 10.1609/aaai.v33i01.33018610
9	WANG Y Z， LIAN Z H. Exploring font-independent features for scene text recognition［C］// Proceedings of the 28th ACM International Conference on Multimedia. New York： ACM， 2020： 1900-1920. 10.1145/3394171.3413592
10	朱莉，陈宏，景小荣. 任意方向自然场景文本识别［J］. 重庆邮电大学学报（自然科学版）， 2022， 34（1）：125-133.
	ZHU L， CHEN H， JING X R. Text recognition of natural scenes in any direction［J］. Journal of Chongqing University of Posts and Telecommunications （Natural Science Edition）， 2022， 34（1）： 125-133.
11	WANG Y Z， GAO Y， LIAN Z H. Attribute2Font： creating fonts you want from attributes［J］. ACM Transactions on Graphics， 2020， 39（4）： No.69. 10.1145/3386569.3392456
12	XIE Y C， CHEN X Y， SUN L， et al. DG-Font： deformable generative networks for unsupervised font generation［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 5126-5136. 10.1109/cvpr46437.2021.00509
13	LIU Y T， LIAN Z H. FontRL： Chinese font synthesis via deep reinforcement learning［C］// Proceedings of the 35th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2021： 2198-2206. 10.1609/aaai.v35i3.16318
14	WU L， ZHANG C Q， LIU J M， et al. Editing text in the wild［C］// Proceedings of the 27th ACM International Conference on Multimedia. New York： ACM， 2019： 1500-1508. 10.1145/3343031.3350929
15	YANG Q P， HUANG J， LIN W. SwapText： image based texts transfer in scenes［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 14688-14697. 10.1109/cvpr42600.2020.01471
16	ROY P， BHATTACHARYA S， GHOSH S， et al. STEFANN： scene text editor using font adaptive neural network［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 13225-13234. 10.1109/cvpr42600.2020.01324
17	SHIMODA W， HARAGUCHI D， UCHIDA S， et al. De-rendering stylized texts［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 1056-1065. 10.1109/iccv48922.2021.00111
18	WANG Z， BOVIK A C， SHEIKH H R， et al. Image quality assessment： from error visibility to structural similarity［J］. IEEE Transactions on Image Processing， 2004， 13（4）： 600-612. 10.1109/tip.2003.819861
19	ZHANG S T， LIU Y L， JIN L W， et al. EnsNet： ensconce text in the wild［C］// Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2019： 801-808. 10.1609/aaai.v33i01.3301801
20	LIU C Y， LIU Y L， JIN L W， et al. EraseNet： end-to-end text removal in the wild［J］. IEEE Transactions on Image Processing， 2020， 29： 8760-8775. 10.1109/tip.2020.3018859
21	RONNEBERGER O， FISCHER P， BROX T. U-net： convolutional networks for biomedical image segmentation［C］// Proceedings of the 2015 International Conference on Medical Image Computing and Computer-Assisted Intervention， LNCS 9351. Cham： Springer， 2015： 234-241.
22	YU F， KOLTUN V. Multi-scale context aggregation by dilated convolutions［EB/OL］. ［2022-04-07］. . 10.1109/cvpr.2017.75
23	HE K M， ZHANG X Y， REN S Q， et al. Deep residual learning for image recognition［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778. 10.1109/cvpr.2016.90
24	BAEK J， KIM G， LEE J， et al. What is wrong with scene text recognition model comparisons？ dataset and model analysis［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 4714-4722. 10.1109/iccv.2019.00481
25	GRAVES A， LIWICKI M， FERNÁNDEZ S， et al. A novel connectionist system for unconstrained handwriting recognition［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2009， 31（5）： 855-868. 10.1109/tpami.2008.137
26	ISOLA P， ZHU J Y， ZHOU T H， et al. Image-to-image translation with conditional adversarial networks［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 5967-5976. 10.1109/cvpr.2017.632
27	KINGMA D P， BA J L. Adam： a method for stochastic optimization［EB/OL］. ［2022-03-18］. .
28	KARATZAS D， SHAFAIT F， UCHIDA S， et al. ICDAR 2013 robust reading competition［C］// Proceedings of the 12th International Conference on Document Analysis and Recognition. Piscataway： IEEE， 2013： 1484-1493. 10.1109/icdar.2013.221

[1]	董峻妃, 郑伯川, 杨泽静. 基于卷积神经网络的车牌字符识别[J]. 计算机应用, 2017, 37(7): 2014-2018.
[2]	孔宇, 王淑营. 基于Android平台的汽车售后维修服务信息采集解决方案[J]. 计算机应用, 2015, 35(12): 3586-3591.
[3]	薛亚军丁勇. 基于相对熵函数准则的小波网络字符识别方法[J]. 计算机应用, 2010, 30(4): 977-979.
[4]	吴敏. 基于相似关系粗糙集模型的数值属性约简算法[J]. 计算机应用, 2010, 30(1): 156-158.
[5]	刘长红杨扬陈勇. 基于压缩传感的手写字符识别方法[J]. 计算机应用, 2009, 29(08): 2080-2082.
[6]	袁健张劲松马良. 一种有效预防点击欺诈的策略[J]. 计算机应用, 2009, 29(07): 1790-1792.
[7]	刘亦书 . 利用高斯描绘子进行字符识别[J]. 计算机应用, 2006, 26(11): 2778-2780.
[8]	任民宏 . 轮廓跟踪算法的改进及在字符识别技术中的应用[J]. 计算机应用, 2006, 26(10): 2378-2379.
[9]	吴丽芸;王文伟;张平;陈俊. 手写混合字符集识别的多特征多级分类器设计[J]. 计算机应用, 2005, 25(12): 2948-2950.

基于字体字符属性引导的文本图像编辑方法

Text image editing method based on font and character attribute guidance

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 11

参考文献 28

相关文章 9

编辑推荐

Metrics