《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (4): 1297-1302.DOI: 10.11772/j.issn.1001-9081.2022020278

• 多媒体计算与计算机仿真 • 上一篇    

基于注意力机制编码器解码器的手写数学公式识别模型

陈路1, 陈道喜2, 陆一鸣3, 陆卫忠1,4()   

  1. 1.苏州科技大学 电子与信息工程学院, 江苏 苏州 215009
    2.江苏省苏州技师学院 信息工程系, 江苏 苏州 215009
    3.苏州科技大学天平学院, 江苏 苏州 215009
    4.江苏省建筑智慧节能重点实验室(苏州科技大学), 江苏 苏州 215009
  • 收稿日期:2022-03-11 修回日期:2022-05-18 接受日期:2022-05-23 发布日期:2022-08-16 出版日期:2023-04-10
  • 通讯作者: 陆卫忠
  • 作者简介:陈路(2000—),男,安徽巢湖人,主要研究方向:深度学习、计算机视觉;
    陈道喜(1974—),男,安徽含山人,教授,硕士,主要研究方向:人工智能;
    陆一鸣(1990—),男,江苏苏州人,助理讲师,硕士,主要研究方向:人工智能、量化交易;
  • 基金资助:
    国家自然科学基金资助项目(61472267)

Handwritten mathematical expression recognition model based on attention mechanism and encoder-decoder

Lu CHEN1, Daoxi CHEN2, Yiming LU3, Weizhong LU1,4()   

  1. 1.School of Electronic and Information Engineering,Suzhou University of Science and Technology,Suzhou Jiangsu 215009,China
    2.Department of Information Engineering,Suzhou Technician College Jiangsu Province,Suzhou Jiangsu 215009,China
    3.Tianping College of Suzhou University of Science and Technology,Suzhou Jiangsu 215009,China
    4.Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency (Suzhou University of Science and Technology),Suzhou Jiangsu 215009,China
  • Received:2022-03-11 Revised:2022-05-18 Accepted:2022-05-23 Online:2022-08-16 Published:2023-04-10
  • Contact: Weizhong LU
  • About author:CHEN Lu, born in 2000. His research interests include deep learning, computer vision.
    CHEN Daoxi, born in 1974, M. S., professor. His research interests include artificial intelligence.
    LU Yiming, born in 1990, M. S., assistant lecturer. His research interests include artificial intelligence, quantitative trading.
  • Supported by:
    National Natural Science Foundation of China(61472267)

摘要:

针对现有的手写数学公式识别(HMER)方法经过卷积神经网络(CNN)多次池化后,图像分辨率降低、特征信息丢失,从而引起解析错误的问题,提出基于注意力机制编码器?解码器的HMER模型。首先,采用稠密卷积网络(DenseNet)作为编码器,使用稠密连接加强特征提取,促进梯度传播,并缓解梯度消失;其次,采用门控循环单元(GRU)作为解码器,并引入注意力机制,将注意力分配到图像的不同区域,从而准确地实现符号识别和结构分析;最后,对手写数学公式图像进行编码,将编码结果解码为LaTeX序列。在在线手写数学公式识别竞赛(CROHME)数据集上的实验结果表明,所提模型的识别率提升到40.39%,而在3个级别的允许误差范围内,识别率分别提升到52.74%、58.82%和62.98%。相较于双向长短期记忆(BLSTM)网络模型,所提模型的识别率提高了3.17个百分点;而在3个级别的允许误差范围内,识别率分别提高了8.52、11.56和12.78个百分点。可见,所提模型能够准确地解析手写数学公式图像,生成LaTeX序列,提升识别率。

关键词: 手写数学公式识别, 编码器?解码器, 稠密卷积网络, 门控循环单元, 注意力机制

Abstract:

Aiming at the problem that the existing Handwritten Mathematical Expression Recognition (HMER) methods reduce image resolution and lose feature information after multiple pooling operations in Convolutional Neural Network (CNN), which leads to parsing errors, an encoder-decoder model for HMER based on attention mechanism was proposed. Firstly, Densely connected convolutional Network (DenseNet) was used as the encoder, so that the dense connections were used to enhance feature extraction, promote gradient propagation, and alleviate vanishing gradient. Secondly, Gated Recurrent Unit (GRU) was used as the decoder, and attention mechanism was introduced, so that, the attention was allocated to different regions of image to realize symbol recognition and structural analysis accurately. Finally, the handwritten mathematical expression images were encoded, and the encoding results were decoded into LaTeX sequences. Experimental results on Competition on Recognition of Online Handwritten Mathematical Expressions (CROHME) dataset show that the proposed model has the recognition rate improved to 40.39%. And within the allowable error range of three levels, the model has the recognition rate improved to 52.74%, 58.82% and 62.98%, respectively. Compared with the Bidirectional Long Short-Term Memory (BLSTM) network model, the proposed model increases the recognition rate by 3.17 percentage points. And within the allowable error range of three levels, the proposed model has the recognition rate increased by 8.52 percentage points, 11.56 percentage points, and 12.78 percentage points, respectively. It can be seen that the proposed model can accurately parse the handwritten mathematical expression images, generate LaTeX sequences, and improve the recognition rate.

Key words: Handwritten Mathematical Expression Recognition (HMER), encoder-decoder, Densely connected convolutional Network (DenseNet), Gated Recurrent Unit (GRU), attention mechanism

中图分类号: