Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (8): 2394-2400.DOI: 10.11772/j.issn.1001-9081.2021091564

• Artificial intelligence • Previous Articles    

Handwritten English text recognition based on convolutional neural network and Transformer

Xianjie ZHANG1,2, Zhiming ZHANG1()   

  1. 1.College of Information Engineering,Engineering University of PAP,Xi’an Shaanxi 710086,China
    2.Postgraduate Brigade,Engineering University of PAP,Xi’an Shaanxi 710086,China
  • Received:2021-09-03 Revised:2022-01-05 Accepted:2022-01-17 Online:2022-08-09 Published:2022-08-10
  • Contact: Zhiming ZHANG
  • About author:ZHANG Xianjie, born in 1991, M. S. candidate. His research interests include image processing, handwritten recognition.
    ZHANG Zhiming, born in 1973, Ph. D., professor. His research interests include big data, image processing.

基于卷积神经网络和Transformer的手写体英文文本识别

张显杰1,2, 张之明1()   

  1. 1.武警工程大学 信息工程学院,西安 710086
    2.武警工程大学 研究生大队,西安 710086
  • 通讯作者: 张之明
  • 作者简介:张显杰(1991—),男,四川绵阳人,硕士研究生,主要研究方向:图像处理、手写体识别;
    张之明(1973—),男,山东潍坊人,教授,博士,主要研究方向:大数据、图像处理。

Abstract:

Handwritten text recognition technology can transcribe handwritten documents into editable digital documents. However, due to the problems of different writing styles, ever-changing document structures and low accuracy of character segmentation recognition, handwritten English text recognition based on neural networks still faces many challenges. To solve the above problems, a handwritten English text recognition model based on Convolutional Neural Network (CNN) and Transformer was proposed. Firstly, CNN was used to extract features from the input image. Then, the features were input into the Transformer encoder to obtain the prediction of each frame of the feature sequence. Finally, the Connectionist Temporal Classification (CTC) decoder was used to obtain the final prediction result. A large number of experiments were conducted on the public Institut für Angewandte Mathematik (IAM) handwritten English word dataset. Experimental results show that this model obtains a Character Error Rate (CER) of 3.60% and a Word Error Rate (WER) of 12.70%, which verify the feasibility of the proposed model.

Key words: handwritten English text recognition, deep learning, Convolutional Neural Network (CNN), Transformer, Connectionist Temporal Classification (CTC), attention, segmentation-free

摘要:

手写体文本识别技术可以将手写文档转录成可编辑的数字文档。但由于手写的书写风格迥异、文档结构千变万化和字符分割识别精度不高等问题,基于神经网络的手写体英文文本识别仍面临着许多挑战。针对上述问题,提出基于卷积神经网络(CNN)和Transformer的手写体英文文本识别模型。首先利用CNN从输入图像中提取特征,而后将特征输入到Transformer编码器中得到特征序列每一帧的预测,最后经过链接时序分类(CTC)解码器获得最终的预测结果。在公开的IAM(Institut für Angewandte Mathematik)手写体英文单词数据集上进行了大量的实验结果表明,该模型获得了3.60%的字符错误率(CER)和12.70%的单词错误率(WER),验证了所提模型的可行性。

关键词: 手写体英文文本识别, 深度学习, 卷积神经网络, Transformer, 链接时序分类, 注意力, 无分割

CLC Number: