Neural machine translation integrating bidirectional-dependency self-attention mechanism

doi:10.11772/j.issn.1001-9081.2021101805

Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (12): 3679-3685.DOI: 10.11772/j.issn.1001-9081.2021101805

• Artificial intelligence • Previous Articles

Neural machine translation integrating bidirectional-dependency self-attention mechanism

Zhijin LI¹^,², Hua LAI¹^,²(), Yonghua WEN¹^,², Shengxiang GAO¹^,²

^1.Faculty of Information Engineering and Automation，Kunming University of Science and Technology，Kunming Yunnan 650504，China
^2.Yunnan Key Laboratory of Artificial Intelligence （Kunming University of Science and Technology），Kunming Yunnan 650504，China

Received:2021-10-22 Revised:2022-01-06 Accepted:2022-01-24 Online:2022-04-26 Published:2022-12-10
Contact: Hua LAI
About author:LI Zhijin， born in 1997， M. S. candidate. His research interests include machine translation， natural language processing.
WEN Yonghua， born in 1979， Ph. D. candidate. His research interests include machine translation.
GAO Shengxiang，born in 1977， Ph. D.， associate professor. Her research interests include machine translation， natural language processing， information retrieval.
Supported by:
National Natural Science Foundation of China(61732005);Yunnan Province Major Science and Technology Special Project(202002AD080001-5);Yunnan Province High-tech Industry Special Project(201606)

融合双向依存自注意力机制的神经机器翻译

李治瑾¹^,², 赖华¹^,²(), 文永华¹^,², 高盛祥¹^,²

^1.昆明理工大学信息工程与自动化学院，昆明 650504
^2.云南省人工智能重点实验室（昆明理工大学），昆明 650504

通讯作者: 赖华
作者简介:李治瑾（1997—），男，辽宁大连人，硕士研究生，主要研究方向：机器翻译、自然语言处理
文永华（1979—），男（白族），云南大理人，博士研究生，主要研究方向：机器翻译
文永华（1979—），男（白族），云南大理人，博士研究生，主要研究方向：机器翻译
高盛祥（1977—），女，云南大理人，副教授，博士，主要研究方向：机器翻译、自然语言处理、信息检索。
基金资助:
国家自然科学基金资助项目(61732005);云南省重大科技专项(202002AD080001?5);云南省高新技术产业专项(201606)

Abstract

Abstract:

Aiming at the problem of resource scarcity in neural machine translation， a method for fusion of dependency syntactic knowledge based on a Bidirectional-Dependency self-attention mechanism （Bi-Dependency） was proposed. Firstly， an external parser was used to parse the source sentence to obtain dependency parsing data. Then， the dependency parsing data was transformed into the position vector of the parent word and the weight matrix of the child word. Finally， the dependency knowledge was integrated into the multi-head attention mechanism of the Transformer encoder. By using Bi-Dependency， the translation model was able to simultaneously pay attention to the dependency information in both directions： the parent word to the child word and the child word to the parent word. Experimental results of bi-directional translation show that compared with the Transformer model， in the case of rich resources， the proposed method has the BLEU （BiLingual Evaluation Understudy） value on Chinese-Thai translation improved by 1.07 and 0.86 respectively， and the BLEU value on Chinese-English translation improved by 0.79 and 0.68 respectively； in the case of low resources， the proposed model has the BLEU value increased by 0.51 and 1.06 respectively on Chinese-Thai translation， and the BLEU value increased by 1.04 and 0.40 respectively on Chinese-English translation. It can be seen that Bi-Dependency provides the model with richer dependence information， which can effectively improve the translation performance.

Key words: neural machine translation, bidirectional-dependency attention, multi-head attention, parent word, child word

摘要：

针对神经机器翻译中资源稀缺的问题，提出了一种基于双向依存自注意力机制（Bi-Dependency）的依存句法知识融合方法。首先，利用外部解析器对源句子解析得到依存解析数据；然后，将依存解析数据转化为父词位置向量和子词权重矩阵；最后，将依存知识融合到Transformer编码器的多头注意力机制上。利用Bi-Dependency，翻译模型可以同时对父词到子词、子词到父词两个方向的依存信息进行关注。双向翻译的实验结果表明，与Transformer模型相比，在富资源情况下，所提方法在汉-泰翻译上的BLEU值分别提升了1.07和0.86，在汉-英翻译上的BLEU值分别提升了0.79和0.68；在低资源情况下，所提方法在汉-泰翻译上的BLEU值分别提升了0.51和1.06，在汉-英翻译上的BLEU值分别提升了1.04和0.40。可见Bi-Dependency为模型提供了更丰富的依存信息，能够有效提升翻译性能。

关键词: 神经机器翻译, 双向依存注意力, 多头注意力, 父词, 子词

CLC Number:

TP391.2

Zhijin LI, Hua LAI, Yonghua WEN, Shengxiang GAO. Neural machine translation integrating bidirectional-dependency self-attention mechanism[J]. Journal of Computer Applications, 2022, 42(12): 3679-3685.

李治瑾, 赖华, 文永华, 高盛祥. 融合双向依存自注意力机制的神经机器翻译[J]. 《计算机应用》唯一官方网站, 2022, 42(12): 3679-3685.

Figures/Tables 7

References 25

1	ERIGUCHI A， HASHIMOTO K， TSURUOKA Y. Tree-to-sequence attentional neural machine translation［C］// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg， PA： Association for Computational Linguistics， 2016： 823-833. 10.18653/v1/p16-1078
2	AHARONI R， GOLDBERG Y. Towards string-to-tree neural machine translation［C］// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics （Volume 2： Short Papers）. Stroudsburg， PA： Association for Computational Linguistics， 2017： 132-140. 10.18653/v1/p17-2021
3	GŪ J， SHAVARANI H S， SARKAR A. Top-down tree structured decoding with syntactic connections for neural machine translation and parsing［C］// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： Association for Computational Linguistics， 2018： 401-413. 10.18653/v1/d18-1037
4	ELMAN J L. Finding structure in time［J］. Cognitive Science， 1990， 14（2）：179-211. 10.1207/s15516709cog1402_1
5	HOCHREITER S， SCHMIDHUBER J. Long short-term memory［J］. Neural Computation， 1997， 9（8）：1735-1780. 10.1162/neco.1997.9.8.1735
6	BUGLIARELLO E， OKAZAKI N. Enhancing machine translation with dependency-aware self-attention［C］// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg， PA： Association for Computational Linguistics， 2020： 1618-1627. 10.18653/v1/2020.acl-main.147
7	WU S Z， ZHANG D D， ZHANG Z R， et al. Dependency-to-dependency neural machine translation［J］. IEEE/ACM Transactions on Audio， Speech， and Language Processing， 2018， 26（11）：2132-2141. 10.1109/taslp.2018.2855968
8	ZHANG M S， LI Z H， FU G H， et al. Syntax-enhanced neural machine translation with syntax-aware word representations［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. Stroudsburg， PA： Association for Computational Linguistics， 2019： 1151-1161. 10.18653/v1/n19-1118
9	SAUNDERS D， STAHLBERG F， DE GISPERT A， et al. Multi-representation ensembles and delayed SGD updates improve syntax-based NMT［C］// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics （Volume 2： Short Papers）. Stroudsburg， PA： Association for Computational Linguistics， 2018： 319-325. 10.18653/v1/p18-2051
10	CHOSHEN L， ABEND O. Transition based graph decoder for neural machine translation［EB/OL］. （2021-01-29）［2021-05-10］.. 10.18653/v1/k19-1028
11	安静. 基于依存句法分析与序列标注的英文长句机器翻译［J］. 兰州理工大学学报， 2018， 44（1）：100-103. 10.3969/j.issn.1673-5196.2018.01.020
	AN J. Machine translation of long English sentence based on dependency parsing and sequence labeling［J］. Journal of Lanzhou University of Technology， 2018， 44（1）：100-103. 10.3969/j.issn.1673-5196.2018.01.020
12	王振晗，何建雅琳，余正涛，等. 融合句法解析树的汉-越卷积神经机器翻译［J］. 软件学报， 2020， 31（12）：3797-3807.
	WANG Z H， HE J Y L， YU Z T， et al. Chinese-Vietnamese convolutional neural machine translation with incorporating syntactic parsing tree［J］. Journal of Software， 2020， 31（12）：3797-3807.
13	WANG C Y， WU S Z， LIU S J. Source dependency-aware transformer with supervised self-attention［EB/OL］. （2019-09-05）［2021-05-10］..
14	NGUYEN X P， JOTY S， HOI S， et al. Tree-structured attention with hierarchical accumulation［EB/OL］. （2021-02-19）［2021-05-10］..
15	ZHANG T F， HUANG H Y， FENG C， et al. Self-supervised bilingual syntactic alignment for neural machine translation［C］// Proceedings of the 35th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2021： 14454-14462. 10.1609/aaai.v35i16.17699
16	SLOBODKIN A， CHOSHEN L， ABEND O. Semantics-aware attention improves neural machine translation［C］// Proceedings of the 11th Joint Conference on Lexical and Computational Semantics. Stroudsburg， PA： Association for Computational Linguistics， 2022： 28-43. 10.18653/v1/2022.starsem-1.3
17	张海玲，邵玉斌，杨丹，等. 基于句法规则层次化分析的神经机器翻译［J］. 小型微型计算机系统， 2021， 42（11）：2300-2306. 10.3969/j.issn.1000-1220.2021.11.010
	ZHANG H L， SHAO Y B， YANG D， et al. Neural machine translation based on hierarchical analysis of syntactic rules［J］. Journal of Chinese Computer Systems， 2021， 42（11）：2300-2306. 10.3969/j.issn.1000-1220.2021.11.010
18	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017： 6000-6010.
19	DREDZE M， BLITZER J， TALUKDAR P P， et al. Frustratingly hard domain adaptation for dependency parsing ［C］// Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Stroudsburg， PA： Association for Computational Linguistics， 2007： 1051-1055.
20	SRIVASTAVA N， HINTON G， KRIZHEVSKY A， et al. Dropout： a simple way to prevent neural networks from overfitting［J］. Journal of Machine Learning Research， 2014， 15：1929-1958.
21	KOEHN P， HOANG H， BIRCH A， et al. Moses： open source toolkit for statistical machine translation［C］// Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions. Stroudsburg， PA： Association for Computational Linguistics， 2007： 177-180. 10.3115/1557769.1557821
22	PAPINENI K， ROUKOS S， WARD T， et al. BLEU： a method for automatic evaluation of machine translation［C］// Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Stroudsburg， PA： Association for Computational Linguistics， 2002： 311-318. 10.3115/1073083.1073135
23	SZEGEDY C， VANHOUCKE V， IOFFE S， et al. Rethinking the inception architecture for computer vision［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 2818-2826. 10.1109/cvpr.2016.308
24	WU Y H， SCHUSTER M， CHEN Z F， et al. Google's neural machine translation system： bridging the gap between human and machine translation［EB/OL］. （2016-10-08）［2021-05-10］..
25	RAGANATO A， TIEDEMANN J. An analysis of encoder representations in transformer-based machine translation［C］// Proceedings of the 2018 EMNLP Workshop BlackboxNLP： Analyzing and Interpreting Neural Networks for NLP. Stroudsburg， PA： Association for Computational Linguistics， 2018： 287-297. 10.18653/v1/w18-5431

语料	语料总数	训练集	验证集	测试集
汉-泰	1 066 004	1 055 002	5 001	6 001
汉-英	8 021 474	8 018 471	2 002	1 001
英-德	174 272	160 239	7 283	6 750
汉-泰_小	211 002	200 000	5 001	6 001
汉-英_小	203 003	200 000	2 002	1 001

语料	语料总数	训练集	验证集	测试集
汉-泰	1 066 004	1 055 002	5 001	6 001
汉-英	8 021 474	8 018 471	2 002	1 001
英-德	174 272	160 239	7 283	6 750
汉-泰_小	211 002	200 000	5 001	6 001
汉-英_小	203 003	200 000	2 002	1 001

模型	汉-泰	泰-汉	汉-英	英-汉	英-德	德-英	汉-泰_小	泰-汉_小	汉-英_小	英-汉_小
Transformer	9.16	7.37	21.29	19.14	28.30	34.30	3.15	2.61	10.92	9.37
Pascal	9.49	7.98	21.67	19.53	28.64	34.60	3.59	3.14	11.31	9.65
Bi-Dependency	10.23	8.23	22.08	19.82	28.76	34.73	3.66	3.67	11.96	9.77

模型	汉-泰	泰-汉	汉-英	英-汉	英-德	德-英	汉-泰_小	泰-汉_小	汉-英_小	英-汉_小
Transformer	9.16	7.37	21.29	19.14	28.30	34.30	3.15	2.61	10.92	9.37
Pascal	9.49	7.98	21.67	19.53	28.64	34.60	3.59	3.14	11.31	9.65
Bi-Dependency	10.23	8.23	22.08	19.82	28.76	34.73	3.66	3.67	11.96	9.77

模型	汉-英	英-汉	汉-英_小	英-汉_小
Transformer	21.29	19.14	10.92	9.37
Transformer+CWord	21.64	19.01	10.99	9.67
Pascal	21.67	19.53	11.31	9.65
Bi-Dependency	22.08	19.82	11.96	9.77

Neural machine translation integrating bidirectional-dependency self-attention mechanism

融合双向依存自注意力机制的神经机器翻译

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 7

References 25

Related Articles 5

Recommended Articles

Metrics

[1]	Lei YANG, Hongdong ZHAO, Kuaikuai YU. End-to-end speech emotion recognition based on multi-head attention [J]. Journal of Computer Applications, 2022, 42(6): 1869-1875.
[2]	Zengzhen DU, Dongxin TANG, Dan XIE. Method of generating rhetorical questions based on deep neural network in intelligent consultation [J]. Journal of Computer Applications, 2022, 42(3): 867-873.
[3]	Longchao GONG, Junjun GUO, Zhengtao YU. Neural machine translation method based on source language syntax enhanced decoding [J]. Journal of Computer Applications, 2022, 42(11): 3386-3394.
[4]	JIA Chengxun, LAI Hua, YU Zhengtao, WEN Yonghua, YU Zhiqiang. Chinese-Vietnamese pseudo-parallel corpus generation based on monolingual language model [J]. Journal of Computer Applications, 2021, 41(6): 1652-1658.
[5]	Yu DENG, Xiaoyu LI, Jian CUI, Qi LIU. Multi-head attention memory network for short text sentiment classification [J]. Journal of Computer Applications, 2021, 41(11): 3132-3138.

注意力层	汉-英	英-汉	汉-英_小	英-汉_小
1	22.08	19.82	11.96	9.77
2	21.43	18.94	11.12	9.67
3	21.74	19.40	10.48	9.67
4	21.49	18.81	10.97	9.37
5	21.69	18.94	11.18	9.72
6	21.45	18.99	10.99	9.72

注意力层	汉-英	英-汉	汉-英_小	英-汉_小
1	22.08	19.82	11.96	9.77
2	21.43	18.94	11.12	9.67
3	21.74	19.40	10.48	9.67
4	21.49	18.81	10.97	9.37
5	21.69	18.94	11.18	9.72
6	21.45	18.99	10.99	9.72