Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (8): 2394-2400.DOI: 10.11772/j.issn.1001-9081.2021091564
Special Issue: 人工智能
• Artificial intelligence • Previous Articles Next Articles
Xianjie ZHANG1,2, Zhiming ZHANG1()
Received:
2021-09-03
Revised:
2022-01-05
Accepted:
2022-01-17
Online:
2022-08-09
Published:
2022-08-10
Contact:
Zhiming ZHANG
About author:
ZHANG Xianjie, born in 1991, M. S. candidate. His research interests include image processing, handwritten recognition.通讯作者:
张之明
作者简介:
张显杰(1991—),男,四川绵阳人,硕士研究生,主要研究方向:图像处理、手写体识别;CLC Number:
Xianjie ZHANG, Zhiming ZHANG. Handwritten English text recognition based on convolutional neural network and Transformer[J]. Journal of Computer Applications, 2022, 42(8): 2394-2400.
张显杰, 张之明. 基于卷积神经网络和Transformer的手写体英文文本识别[J]. 《计算机应用》唯一官方网站, 2022, 42(8): 2394-2400.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2021091564
层级 | 批量大小 | CER/% | WER/% | 单张图像测试时间/ms | 深度 | 参数量/106 |
---|---|---|---|---|---|---|
conv1 | 128 | 5.50 | 18.50 | 1.98 | 1 | 94.1 |
conv2_x | 64 | 4.30 | 14.52 | 3.14 | 10 | 95.4 |
conv3_x | 32 | 5.62 | 18.73 | 6.40 | 22 | 101.0 |
conv4_x | 16 | 5.42 | 18.02 | 19.43 | 40 | 132.0 |
conv5_x | 8 | 13.52 | 38.33 | 37.92 | 49 | 197.0 |
Tab. 1 Performance of different interception layers of SE-ResNet-50
层级 | 批量大小 | CER/% | WER/% | 单张图像测试时间/ms | 深度 | 参数量/106 |
---|---|---|---|---|---|---|
conv1 | 128 | 5.50 | 18.50 | 1.98 | 1 | 94.1 |
conv2_x | 64 | 4.30 | 14.52 | 3.14 | 10 | 95.4 |
conv3_x | 32 | 5.62 | 18.73 | 6.40 | 22 | 101.0 |
conv4_x | 16 | 5.42 | 18.02 | 19.43 | 40 | 132.0 |
conv5_x | 8 | 13.52 | 38.33 | 37.92 | 49 | 197.0 |
模型 | 预处理 | 语言模型 | 词典 | 预训练 | CER/% | WER/% |
---|---|---|---|---|---|---|
RNN+CTC[ | — | — | — | — | — | 20.49 |
RNN+CTC[ | — | — | — | Synthetic | 6.34 | 16.19 |
√ | Synthetic | 2.66 | 5.10 | |||
RNN+CTC[ | √ | — | — | Synthetic | 4.88 | 12.61 |
√ | Synthetic | 2.17 | 4.07 | |||
RNN+Attention[ | √ | — | — | — | 8.80 | 23.80 |
√ | — | 6.20 | 12.70 | |||
Attention[ | √ | — | — | Synthetic | 5.79 | 15.15 |
√ | √ | Synthetic | 4.27 | 8.36 | ||
Attention[ | — | — | — | CTC | 12.60 | — |
CTC+Attention[ | — | — | — | — | 6.60 | 18.20 |
本文模型 | √ | — | — | — | 3.60 | 12.70 |
Tab. 2 Comparison of evaluation results on IAM handwritten English word dataset
模型 | 预处理 | 语言模型 | 词典 | 预训练 | CER/% | WER/% |
---|---|---|---|---|---|---|
RNN+CTC[ | — | — | — | — | — | 20.49 |
RNN+CTC[ | — | — | — | Synthetic | 6.34 | 16.19 |
√ | Synthetic | 2.66 | 5.10 | |||
RNN+CTC[ | √ | — | — | Synthetic | 4.88 | 12.61 |
√ | Synthetic | 2.17 | 4.07 | |||
RNN+Attention[ | √ | — | — | — | 8.80 | 23.80 |
√ | — | 6.20 | 12.70 | |||
Attention[ | √ | — | — | Synthetic | 5.79 | 15.15 |
√ | √ | Synthetic | 4.27 | 8.36 | ||
Attention[ | — | — | — | CTC | 12.60 | — |
CTC+Attention[ | — | — | — | — | 6.60 | 18.20 |
本文模型 | √ | — | — | — | 3.60 | 12.70 |
错误类型 | 占比/% |
---|---|
单词内部错误1个字母 | 41 |
单词开头或结尾错误1个字母 | 27 |
大小写错误 | 4 |
整个单词错误 | 1 |
其他 | 27 |
Tab. 3 Proportion of types of prediction errors
错误类型 | 占比/% |
---|---|
单词内部错误1个字母 | 41 |
单词开头或结尾错误1个字母 | 27 |
大小写错误 | 4 |
整个单词错误 | 1 |
其他 | 27 |
1 | WANG Y T, XIAO W J, LI S. Offline handwritten text recognition using deep learning: a review[J]. Journal of Physics: Conference Series, 2021, 1848: No.012015. 10.1088/1742-6596/1848/1/012015 |
2 | 马洋洋,肖冰.基于CTC-Attention脱机手写体文本识别[J].激光与光电子学进展, 2021, 58(12): No.1210007. 10.3788/lop202158.1210007 |
MA Y Y, XIAO B. Offline handwritten text recognition based on CTC-Attention[J]. Laser and Optoelectronics Progress, 2021, 58(12): No.1210007. 10.3788/lop202158.1210007 | |
3 | KUMAR M, JINDAL M K, SHARMA R K. Segmentation of isolated and touching characters in offline handwritten Gurmukhi script recognition[J]. International Journal of Information Technology Computer Science, 2014, 6(2): 58-63. 10.5815/ijitcs.2014.02.08 |
4 | WANG Y W, DING X Q, LIU C S. Topic language model adaption for recognition of homologous offline handwritten Chinese text image[J]. IEEE Signal Processing Letters, 2014, 21(5): 550-553. 10.1109/lsp.2014.2308572 |
5 | ESPAÑA-BOQUERA S, CASTRO-BLEDA M J, GORBE-MOYA J, et al. Improving offline handwritten text recognition with hybrid HMM/ANN models[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(4): 767-779. 10.1109/tpami.2010.141 |
6 | WANG Z R, DU J, WANG W C, et al. A comprehensive study of hybrid neural network hidden Markov model for offline handwritten Chinese text recognition[J]. International Journal on Document Analysis Recognition, 2018, 21(4): 241-251. 10.1007/s10032-018-0307-0 |
7 | WANG Q Q, LU Y. A sequence labeling convolutional network and its application to handwritten string recognition [C]// Proceedings of the 26th International Joint Conference on Artificial Intelligence. California: ijcai.org, 2017: 2950-2956. 10.24963/ijcai.2017/411 |
8 | SUEIRAS J, RUIZ V, SÁNCHEZ Á, et al. Offline continuous handwriting recognition using sequence to sequence neural networks[J]. Neurocomputing, 2018, 289: 119-128. 10.1016/j.neucom.2018.02.008 |
9 | DUTTA K, KRISHNAN P, MATHEW M, et al. Improving CNN-RNN hybrid networks for handwriting recognition [C]// Proceedings of the 16th International Conference on Frontiers in Handwriting Recognition. Piscataway: IEEE, 2018: 80-85. 10.1109/icfhr-2018.2018.00023 |
10 | GEETHA R, THILAGAM T, PADMAVATHY T. Effective offline handwritten text recognition model based on a sequence-to-sequence approach with CNN-RNN networks[J]. Neural Computing Applications, 2021, 33(17): 10923-10934. |
11 | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2017: 6000-6010. 10.1016/s0262-4079(17)32358-8 |
12 | DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[EB/OL]. (2021-06-03) [2022-01-04]. . |
13 | WANG W H, XIE E Z, LI X, et al. Pyramid vision transformer: a versatile backbone for dense prediction without convolutions [C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 548-558. 10.1109/iccv48922.2021.00061 |
14 | WANG W H, XIE E Z, LI X, et al. PVT v2: improved baselines with pyramid vision transformer[J]. Computational Visual Media, 2022, 8(3): 415-424. 10.1007/s41095-022-0274-8 |
15 | RUSSAKOVSKY O, DENG J, SU H, et al. ImageNet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3): 211-252. 10.1007/s11263-015-0816-y |
16 | GIRSHICK R. Fast R-CNN [C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2015: 1440-1448. 10.1109/iccv.2015.169 |
17 | REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks [C]// Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2015: 91-99. |
18 | DAI J F, HE K M, SUN J. Instance-aware semantic segmentation via multi-task network cascades [C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 3150-3158. 10.1109/cvpr.2016.343 |
19 | HE K M, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN [C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2980-2988. 10.1109/iccv.2017.322 |
20 | KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks [C]// Proceedings of the 25th International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2012: 1097-1105. |
21 | SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2015-04-10) [2022-01-04]. . |
22 | SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions [C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2015: 1-9. 10.1109/cvpr.2015.7298594 |
23 | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition [C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778. 10.1109/cvpr.2016.90 |
24 | XIE S N, GIRSHICK R, DOLLÁR P, et al. Aggregated residual transformations for deep neural networks [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 5987-5995. 10.1109/cvpr.2017.634 |
25 | HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 2261-2269. 10.1109/cvpr.2017.243 |
26 | HU J, SHEN L, ALBANIE S, et al. Gather-excite: exploiting feature context in convolutional neural networks [C]// Proceedings of the 32nd International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2018: 9423-9433 |
27 | HU J, SHEN L, SUN G. Squeeze-and-excitation networks [C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7132-7141. 10.1109/cvpr.2018.00745 |
28 | DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding [C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1(Long and Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2019: 4171-4186. |
29 | DONG L H, XU S, XU B. Speech-Transformer: a no-recurrence sequence-to-sequence model for speech recognition [C]// Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2018: 5884-5888. 10.1109/icassp.2018.8462506 |
30 | KANG L, RIBA P, RUSIÑOL M, et al. Pay attention to what you read: non-recurrent handwritten text-line recognition[J]. Pattern Recognition, 2022, 129: No.108766. 10.1016/j.patcog.2022.108766 |
31 | MOSTAFA A, MOHAMED O, ASHRAF A, et al. OCFormer: a Transformer-based model for Arabic handwritten text recognition [C]// Proceedings of the 2021 International Mobile, Intelligent, and Ubiquitous Computing Conference. Piscataway: IEEE, 2021: 182-186. 10.1109/miucc52538.2021.9447608 |
32 | LY N T, NGUYEN C T, NAKAGAWA M. Attention augmented convolutional recurrent network for handwritten Japanese text recognition [C]// Proceedings of the 17th International Conference on Frontiers in Handwriting Recognition. Piscataway: IEEE, 2020: 163-168. 10.1109/icfhr2020.2020.00039 |
33 | GRAVES A, FERNÁNDEZ S, GOMEZ F, et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks [C]// Proceedings of the 23rd International Conference on Machine Learning. New York: ACM, 2006: 369-376. 10.1145/1143844.1143891 |
34 | GRAVES A, LIWICKI M, FERNÁNDEZ S, et al. A novel connectionist system for unconstrained handwriting recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(5): 855-868. 10.1109/tpami.2008.137 |
35 | CHEN Z, WU Y C, YIN F, et al. Simultaneous script identification and handwriting recognition via multi-task learning of recurrent neural networks [C]// Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition. Piscataway: IEEE, 2017: 525-530. 10.1109/icdar.2017.92 |
36 | ZHAN H J, WANG Q Q, LU Y. Handwritten digit string recognition by combination of residual network and RNN-CTC [C]// Proceedings of the 2017 International Conference on Neural Information Processing, LNCS 10639. Cham: Springer, 2017: 583-591. |
37 | KRISHNAN P, DUTTA K, JAWAHAR C V. Word spotting and recognition using deep embedding [C]// Proceedings of the 13th IAPR International Workshop on Document Analysis Systems. Piscataway: IEEE, 2018: 1-6. 10.1109/das.2018.70 |
38 | BA J L, KIROS J R, HINTON G E. Layer normalization[EB/OL]. (2016-07-21) [2022-01-04]. . |
39 | MARTI U V, BUNKE H. The IAM-database: an English sentence database for offline handwriting recognition[J]. International Journal on Document Analysis Recognition, 2002, 5(1): 39-46. 10.1007/s100320200071 |
40 | LUO C J, ZHU Y Z, JIN L W, et al. Learn to augment: joint data augmentation and network optimization for text recognition [C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 13743-13752. 10.1109/cvpr42600.2020.01376 |
41 | MOR N, WOLF L. Confidence prediction for lexicon-free OCR [C]// Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision. Piscataway: IEEE, 2018: 218-225. 10.1109/wacv.2018.00030 |
42 | BLUCHE T, LOURADOUR J, MESSINA R. Scan, attend and read: end-to-end handwritten paragraph recognition with MDLSTM attention [C]// Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition. Piscataway: IEEE, 2017: 1050-1055. 10.1109/icdar.2017.174 |
43 | KANG L, RIBA P, VILLEGAS M, et al. Candidate fusion: integrating language modelling into a sequence-to-sequence handwritten word recognition architecture[J]. Pattern Recognition, 2021, 112: No.107790. 10.1016/j.patcog.2020.107790 |
[1] | Jieru JIA, Jianchao YANG, Shuorui ZHANG, Tao YAN, Bin CHEN. Unsupervised person re-identification based on self-distilled vision Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2893-2902. |
[2] | Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974. |
[3] | Xiyuan WANG, Zhancheng ZHANG, Shaokang XU, Baocheng ZHANG, Xiaoqing LUO, Fuyuan HU. Unsupervised cross-domain transfer network for 3D/2D registration in surgical navigation [J]. Journal of Computer Applications, 2024, 44(9): 2911-2918. |
[4] | Jiepo FANG, Chongben TAO. Hybrid internet of vehicles intrusion detection system for zero-day attacks [J]. Journal of Computer Applications, 2024, 44(9): 2763-2769. |
[5] | Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738. |
[6] | Hang YANG, Wanggen LI, Gensheng ZHANG, Zhige WANG, Xin KAI. Multi-layer information interactive fusion algorithm based on graph neural network for session-based recommendation [J]. Journal of Computer Applications, 2024, 44(9): 2719-2725. |
[7] | Yunchuan HUANG, Yongquan JIANG, Juntao HUANG, Yan YANG. Molecular toxicity prediction based on meta graph isomorphism network [J]. Journal of Computer Applications, 2024, 44(9): 2964-2969. |
[8] | Xin YANG, Xueni CHEN, Chunjiang WU, Shijie ZHOU. Short-term traffic flow prediction of urban highway based on variant residual model and Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2947-2951. |
[9] | Shunyong LI, Shiyi LI, Rui XU, Xingwang ZHAO. Incomplete multi-view clustering algorithm based on self-attention fusion [J]. Journal of Computer Applications, 2024, 44(9): 2696-2703. |
[10] | Liehong REN, Lyuwen HUANG, Xu TIAN, Fei DUAN. Multivariate long-term series forecasting method with DFT-based frequency-sensitive dual-branch Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2739-2746. |
[11] | Yexin PAN, Zhe YANG. Optimization model for small object detection based on multi-level feature bidirectional fusion [J]. Journal of Computer Applications, 2024, 44(9): 2871-2877. |
[12] | Yun LI, Fuyou WANG, Peiguang JING, Su WANG, Ao XIAO. Uncertainty-based frame associated short video event detection method [J]. Journal of Computer Applications, 2024, 44(9): 2903-2910. |
[13] | Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892. |
[14] | Jinjin LI, Guoming SANG, Yijia ZHANG. Multi-domain fake news detection model enhanced by APK-CNN and Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2674-2682. |
[15] | Yeheng LI, Guangsheng LUO, Qianmin SU. Logo detection algorithm based on improved YOLOv5 [J]. Journal of Computer Applications, 2024, 44(8): 2580-2587. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||