基于注意力机制编码器‒解码器的手写数学公式识别模型

doi:10.11772/j.issn.1001-9081.2022020278

《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (4): 1297-1302.DOI: 10.11772/j.issn.1001-9081.2022020278

• 多媒体计算与计算机仿真 • 上一篇

基于注意力机制编码器‒解码器的手写数学公式识别模型

陈路¹, 陈道喜², 陆一鸣³, 陆卫忠¹^,⁴()

^1.苏州科技大学电子与信息工程学院, 江苏苏州 215009
^2.江苏省苏州技师学院信息工程系, 江苏苏州 215009
^3.苏州科技大学天平学院, 江苏苏州 215009
^4.江苏省建筑智慧节能重点实验室(苏州科技大学), 江苏苏州 215009

收稿日期:2022-03-11 修回日期:2022-05-18 接受日期:2022-05-23 发布日期:2022-08-16 出版日期:2023-04-10
通讯作者: 陆卫忠
作者简介:陈路（2000—），男，安徽巢湖人，主要研究方向：深度学习、计算机视觉；
陈道喜（1974—），男，安徽含山人，教授，硕士，主要研究方向：人工智能；
陆一鸣（1990—），男，江苏苏州人，助理讲师，硕士，主要研究方向：人工智能、量化交易；
基金资助:
国家自然科学基金资助项目(61472267)

Handwritten mathematical expression recognition model based on attention mechanism and encoder-decoder

Lu CHEN¹, Daoxi CHEN², Yiming LU³, Weizhong LU¹^,⁴()

^1.School of Electronic and Information Engineering，Suzhou University of Science and Technology，Suzhou Jiangsu 215009，China
^2.Department of Information Engineering，Suzhou Technician College Jiangsu Province，Suzhou Jiangsu 215009，China
^3.Tianping College of Suzhou University of Science and Technology，Suzhou Jiangsu 215009，China
^4.Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency （Suzhou University of Science and Technology），Suzhou Jiangsu 215009，China

Received:2022-03-11 Revised:2022-05-18 Accepted:2022-05-23 Online:2022-08-16 Published:2023-04-10
Contact: Weizhong LU
About author:CHEN Lu， born in 2000. His research interests include deep learning， computer vision.
CHEN Daoxi， born in 1974， M. S.， professor. His research interests include artificial intelligence.
LU Yiming， born in 1990， M. S.， assistant lecturer. His research interests include artificial intelligence， quantitative trading.
Supported by:
National Natural Science Foundation of China(61472267)

摘要/Abstract

摘要：

针对现有的手写数学公式识别（HMER）方法经过卷积神经网络（CNN）多次池化后，图像分辨率降低、特征信息丢失，从而引起解析错误的问题，提出基于注意力机制编码器?解码器的HMER模型。首先，采用稠密卷积网络（DenseNet）作为编码器，使用稠密连接加强特征提取，促进梯度传播，并缓解梯度消失；其次，采用门控循环单元（GRU）作为解码器，并引入注意力机制，将注意力分配到图像的不同区域，从而准确地实现符号识别和结构分析；最后，对手写数学公式图像进行编码，将编码结果解码为LaTeX序列。在在线手写数学公式识别竞赛（CROHME）数据集上的实验结果表明，所提模型的识别率提升到40.39%，而在3个级别的允许误差范围内，识别率分别提升到52.74%、58.82%和62.98%。相较于双向长短期记忆（BLSTM）网络模型，所提模型的识别率提高了3.17个百分点；而在3个级别的允许误差范围内，识别率分别提高了8.52、11.56和12.78个百分点。可见，所提模型能够准确地解析手写数学公式图像，生成LaTeX序列，提升识别率。

关键词: 手写数学公式识别, 编码器?解码器, 稠密卷积网络, 门控循环单元, 注意力机制

Abstract:

Aiming at the problem that the existing Handwritten Mathematical Expression Recognition （HMER） methods reduce image resolution and lose feature information after multiple pooling operations in Convolutional Neural Network （CNN）， which leads to parsing errors， an encoder-decoder model for HMER based on attention mechanism was proposed. Firstly， Densely connected convolutional Network （DenseNet） was used as the encoder， so that the dense connections were used to enhance feature extraction， promote gradient propagation， and alleviate vanishing gradient. Secondly， Gated Recurrent Unit （GRU） was used as the decoder， and attention mechanism was introduced， so that， the attention was allocated to different regions of image to realize symbol recognition and structural analysis accurately. Finally， the handwritten mathematical expression images were encoded， and the encoding results were decoded into LaTeX sequences. Experimental results on Competition on Recognition of Online Handwritten Mathematical Expressions （CROHME） dataset show that the proposed model has the recognition rate improved to 40.39%. And within the allowable error range of three levels， the model has the recognition rate improved to 52.74%， 58.82% and 62.98%， respectively. Compared with the Bidirectional Long Short-Term Memory （BLSTM） network model， the proposed model increases the recognition rate by 3.17 percentage points. And within the allowable error range of three levels， the proposed model has the recognition rate increased by 8.52 percentage points， 11.56 percentage points， and 12.78 percentage points， respectively. It can be seen that the proposed model can accurately parse the handwritten mathematical expression images， generate LaTeX sequences， and improve the recognition rate.

Key words: Handwritten Mathematical Expression Recognition (HMER), encoder-decoder, Densely connected convolutional Network (DenseNet), Gated Recurrent Unit (GRU), attention mechanism

中图分类号:

TP391

陈路, 陈道喜, 陆一鸣, 陆卫忠. 基于注意力机制编码器‒解码器的手写数学公式识别模型[J]. 计算机应用, 2023, 43(4): 1297-1302.

Lu CHEN, Daoxi CHEN, Yiming LU, Weizhong LU. Handwritten mathematical expression recognition model based on attention mechanism and encoder-decoder[J]. Journal of Computer Applications, 2023, 43(4): 1297-1302.

图/表 7

图1 基于注意力机制编码器?解码器的HMER模型

Fig. 1 Encoder-decoder model for HMER based on attention mechanism

图2 DenseNet编码器结构

Fig. 2 Structure of DenseNet encoder

图3 稠密层和过渡层结构

Fig. 3 Structures of dense layer and transition layer

图4 训练集负对数似然损失和测试集识别率

Fig. 4 Negative log likelihood loss of training set and recognition rate of test set

表1 几种模型的识别率和允许误差的识别率 (%)

Tab. 1 Recognition rates and recognition rates within allowable errors of several models

模型	$E x p R a t e$	$E r r o r ≤ 1$	$E r r o r ≤ 2$	$E r r o r ≤ 3$
模型Ⅰ	37.22	44.22	47.26	50.20
模型Ⅱ	15.01	22.31	26.57	27.69
模型Ⅳ	18.97	28.19	32.35	33.37
模型Ⅴ	18.97	26.37	30.83	32.96
模型Ⅵ	25.66	33.16	35.90	37.32
模型Ⅶ	26.06	33.87	38.54	39.96
本文模型	40.39	52.74	58.82	62.98

表1 几种模型的识别率和允许误差的识别率 (%)

Tab. 1 Recognition rates and recognition rates within allowable errors of several models

模型	$E x p R a t e$	$E r r o r ≤ 1$	$E r r o r ≤ 2$	$E r r o r ≤ 3$
模型Ⅰ	37.22	44.22	47.26	50.20
模型Ⅱ	15.01	22.31	26.57	27.69
模型Ⅳ	18.97	28.19	32.35	33.37
模型Ⅴ	18.97	26.37	30.83	32.96
模型Ⅵ	25.66	33.16	35.90	37.32
模型Ⅶ	26.06	33.87	38.54	39.96
本文模型	40.39	52.74	58.82	62.98

表2 消融实验结果 (%)

Tab. 2 Ablation experimental results

模型	$E x p R a t e$	$E r r o r ≤ 1$	$E r r o r ≤ 2$	$E r r o r ≤ 3$
HMER-NA	35.57	43.71	49.39	54.87
HMER	40.39	52.74	58.82	62.98

表2 消融实验结果 (%)

Tab. 2 Ablation experimental results

模型	$E x p R a t e$	$E r r o r ≤ 1$	$E r r o r ≤ 2$	$E r r o r ≤ 3$
HMER-NA	35.57	43.71	49.39	54.87
HMER	40.39	52.74	58.82	62.98

图5 LaTeX序列每一时间步预测的注意力可视化

Fig. 5 Attention visualization of LaTeX sequence prediction at each time step

参考文献 25

1	ZHELEZNIAKOV D， ZAYTSEV V， RADYVONENKO O. Online handwritten mathematical expression recognition and applications： a survey［J］. IEEE Access， 2021， 9：38352-38373. 10.1109/access.2021.3063413
2	COSTA D S， MELLO C A B， D’AMORIM M. A comparative study on methods and tools for handwritten mathematical expression recognition［C］// Proceedings of the 21st ACM Symposium on Document Engineering. New York： ACM， 2021： No.26. 10.1145/3469096.3474936
3	HE F K， TAN J， BI N. Handwritten mathematical expression recognition： a survey［C］// Proceedings of the 2020 International Conference on Pattern Recognition and Artificial Intelligence， LNCS 12068. Cham： Springer， 2020：55-56.
4	WANG J M， DU J， ZHANG J S， et al. Multi-modal attention network for handwritten mathematical expression recognition［C］// Proceedings of the 2019 International Conference on Document Analysis and Recognition. Piscataway： IEEE， 2019：1181-1186. 10.1109/icdar.2019.00191
5	SHAN G C， WANG H Y， LIANG W， et al. Robust encoder-decoder learning framework towards offline handwritten mathematical expression recognition based on multi-scale deep neural network［J］. Science China Information Sciences， 2021， 64（3）： No.139101. 10.1007/s11432-018-9824-9
6	PRIYA A， MISHRA S， RAJ S， et al. Online and offline character recognition： a survey［C］// Proceedings of the 2016 International Conference on Communication and Signal Processing. Piscataway： IEEE， 2016：967-970. 10.1109/iccsp.2016.7754291
7	ZHANG J S， DU J， DAI L R. Track， Attend， and Parse （TAP）： an end-to-end framework for online handwritten mathematical expression recognition［J］. IEEE Transactions on Multimedia， 2019， 21（1）：221-233. 10.1109/tmm.2018.2844689
8	YAN Z Y， ZHANG X D， GAO L C， et al. ConvMath： a convolutional sequence network for mathematical expression recognition［C］// Proceedings of the 25th International Conference on Pattern Recognition. Piscataway： IEEE， 2021：4566-4572. 10.1109/icpr48806.2021.9412913
9	KHAN A， SOHAIL A， ZAHOORA U， et al. A survey of the recent architectures of deep convolutional neural networks［J］. Artificial Intelligence Review， 2020， 53（8）：5455-5516. 10.1007/s10462-020-09825-6
10	ZHANG J S， DU J， ZHANG S L， et al. Watch， attend and parse： an end-to-end neural network based approach to handwritten mathematical expression recognition［J］. Pattern Recognition， 2017， 71： 196-206. 10.1016/j.patcog.2017.06.017
11	SIMONYAN K， ZISSERMAN A. Very deep convolutional networks for large-scale image recognition ［EB/OL］. （2015-04-10）［2022-01-22］..
12	CHO K， van MERRIËNBOER B， BAHDANAU D， et al. On the properties of neural machine translation： encoder-decoder approaches［C］// Proceedings of the 8th Workshop on Syntax， Semantics and Structure in Statistical Translation. Stroudsburg， PA： ACL， 2014： 103-111. 10.3115/v1/w14-4012
13	CHO K， van MERRIËNBOER B， GU̇LÇEHRE Ç， et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation［C］// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： ACL， 2014： 1724-1734. 10.3115/v1/d14-1179
14	ZHANG J S， DU J， DAI L R. Multi-scale attention with dense encoder for handwritten mathematical expression recognition［C］// Proceedings of the 24th International Conference on Pattern Recognition. Piscataway： IEEE， 2018：2245-2250. 10.1109/icpr.2018.8546031
15	HUANG G， LIU Z， L van der MAATEN， et al. Densely connected convolutional networks［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017：2261-2269. 10.1109/cvpr.2017.243
16	ZHANG J S， DU J， YANG Y X， et al. A tree-structured decoder for image-to-markup generation［C］// Proceedings of the 37th International Conference on Machine Learning. New York： JMLR.org， 2020：11076-11085.
17	ZHAO W Q， GAO L C， YAN Z Y， et al. Handwritten mathematical expression recognition with bidirectionally trained transformer［C］// Proceedings of the 2021 International Conference on Document Analysis and Recognition， LNCS 12822. Cham： Springer， 2021：570-584.
18	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017：6000-6010.
19	MOUCHÈRE H， VIARD-GAUDIN C， ZANIBBI R， et al. ICFHR 2014 Competition on recognition of on-line handwritten mathematical expressions （CROHME 2014）［C］// Proceedings of the 14th International Conference on Frontiers in Handwriting Recognition. Piscataway： IEEE， 2014：791-796. 10.1109/icfhr.2014.138
20	RAMCHOUN H， JANATI IDRISSI M A， GHANOU Y， et al. Multilayer perceptron： architecture optimization and training［J］. International Journal of Interactive Multimedia and Artificial Intelligence， 2016， 4（1）：26-30. 10.9781/ijimai.2016.415
21	IOFFE S， SZEGEDY C. Batch normalization： accelerating deep network training by reducing internal covariate shift［C］// Proceedings of the 32nd International Conference on Machine Learning. New York： JMLR.org， 2015：448-456.
22	SRIVASTAVA N， HINTON G E， KRIZHEVSKY A， et al. Dropout： a simple way to prevent neural networks from overfitting［J］. Journal of Machine Learning Research， 2014， 15：1929-1958.
23	BAHDANAU D， CHO K， BENGIO Y. Neural machine translation by jointly learning to align and translate［EB/OL］. （2016-05-19）［2022-01-22］.. 10.1017/9781108608480.003
24	CHOROWSKI J， BAHDANAU D， CHO K， et al. End-to-end continuous speech recognition using attention-based recurrent NN： first results［EB/OL］. （2014-12-04）［2022-01-22］..
25	李康康，张静.基于注意力机制的多层次编码和解码的图像描述模型［J］.计算机应用，2021，41（9）：2504-2509. 10.11772/j.issn.1001-9081.2020111838
	LI K K， ZHANG J. Multi-layer encoding and decoding model for image captioning based on attention mechanism［J］. Journal of Computer Applications， 2021， 41（9）：2504-2509. 10.11772/j.issn.1001-9081.2020111838

[1]	郝巨鸣, 杨景玉, 韩淑梅, 王阳萍. 引入Ghost模块和ECA的YOLOv4公路路面裂缝检测方法[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1284-1290.
[2]	袁泉, 徐雲鹏, 唐成亮. 基于路径标签的文档级关系抽取方法[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1029-1035.
[3]	朱周华, 齐琦. 基于改进YOLOv5s电动车头盔的自动检测与识别[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1291-1296.
[4]	孙浩, 曹健, 李海生, 毛典辉. 基于改进胶囊网络的会话型推荐模型[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1043-1049.
[5]	祖佳贞, 周永霞, 陈乐. 结合注意力的双分支残差低光照图像增强[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1240-1247.
[6]	窦光义, 魏发南, 邱创一, 巢建树. 基于注意力自相关机制的跟踪外观特征[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1248-1254.
[7]	尹聪, 胡汉平. 基于时间注意力机制的时滞混沌系统参数辨识模型[J]. 《计算机应用》唯一官方网站, 2023, 43(3): 842-847.
[8]	孙杰, 吴绍鑫, 王学军, 华璟. 基于Sophon SC5+芯片构架的行人搜索算法与优化[J]. 《计算机应用》唯一官方网站, 2023, 43(3): 744-751.
[9]	李佳东, 张丹普, 范亚琼, 杨剑锋. 基于改进YOLOv5的轻量级船舶目标检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(3): 923-929.
[10]	何雪东, 宣士斌, 王款, 陈梦楠. 融合累积分布函数和通道注意力机制的DeepLabV3+图像分割算法[J]. 《计算机应用》唯一官方网站, 2023, 43(3): 936-942.
[11]	邵小萌, 张猛. 融合注意力机制的时间卷积知识追踪模型[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 343-348.
[12]	徐铭, 李林昊, 齐巧玲, 王利琴. 基于注意力平衡列表的溯因推理模型[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 349-355.
[13]	谌贵辉, 林瑾瑜, 李跃华, 李忠兵, 魏钰力, 卢凯. 注意力机制下的多阶段低照度图像增强网络[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 552-559.
[14]	王萍, 陈楠, 鲁磊. 基于场景先验及注意力引导的跌倒检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 529-535.
[15]	刘聪, 万根顺, 高建清, 付中华. 基于韵律特征辅助的端到端语音识别方法[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 380-384.

基于注意力机制编码器‒解码器的手写数学公式识别模型

Handwritten mathematical expression recognition model based on attention mechanism and encoder-decoder

RichHTML

PDF

PDF (Mobile)

可视化

摘要/Abstract

引用本文

使用本文

图/表 7

参考文献 25

相关文章 15

编辑推荐

Metrics