Neural machine translation method based on source language syntax enhanced decoding

doi:10.11772/j.issn.1001-9081.2021111963

Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (11): 3386-3394.DOI: 10.11772/j.issn.1001-9081.2021111963

Special Issue: 第九届CCF大数据学术会议(CCF Bigdata 2021)

• CCF Bigdata 2021 • Previous Articles Next Articles

Neural machine translation method based on source language syntax enhanced decoding

Longchao GONG¹^,², Junjun GUO¹^,²(), Zhengtao YU¹^,²

^1.Faculty of Information Engineering and Automation，Kunming University of Science and Technology，Kunming Yunnan 650504，China
^2.Key Laboratory of Artificial Intelligence in Yunnan Province （Kunming University of Science and Technology），Kunming Yunnan 650504，China

Received:2021-11-19 Revised:2021-11-25 Accepted:2021-12-06 Online:2021-12-31 Published:2022-11-10
Contact: Junjun GUO
About author:GONG Longchao， born in 1997， M. S. candidate. His research interests include natural language processing， machine translation.
GUO Junjun， born in 1987， Ph. D.， associate professor. His research interests include machine learning， natural language processing， machine translation.
YU Zhengtao， born in 1970， Ph. D.， professor. His research interests include machine learning， natural language processing， machine translation， information retrieval.
Supported by:
National Natural Science Foundation of China(61866020);Science and Technology Innovation 2030 — "New Generation of Artificial Intelligence" Major Project(2020AAA0107904);Yunnan Applied Basic Research Program(2019FB082)

基于源语言句法增强解码的神经机器翻译方法

龚龙超¹^,², 郭军军¹^,²(), 余正涛¹^,²

^1.昆明理工大学信息工程与自动化学院，昆明 650504
^2.云南省人工智能重点实验室（昆明理工大学），昆明 650504

通讯作者: 郭军军
作者简介:龚龙超（1997—），男，河南南阳人，硕士研究生，CCF会员，主要研究方向：自然语言处理、机器翻译
郭军军（1987—），男，山西吕梁人，副教授，博士，CCF会员，主要研究方向：机器学习、自然语言处理、机器翻译 guojjgb@163.com
余正涛（1970—），男，云南曲靖人，教授，博士，CCF高级会员，主要研究方向：机器学习、自然语言处理、机器翻译、信息检索。
基金资助:
国家自然科学基金资助项目(61866020);科技创新2030—“新一代人工智能”重大项目(2020AAA0107904);云南省应用基础研究计划项目(2019FB082)

Abstract

Abstract:

Transformer， one of the best existing machine translation models， is based on the standard end?to?end structure and only relies on pairs of parallel sentences， which is believed to be able to learn knowledge in the corpus automatically. However， this modeling method lacks explicit guidance and cannot effectively mine deep language knowledge， especially in the low?resource environment with limited corpus size and quality， where the sentence encoding has no prior knowledge constraints， leading to the decline of translation quality. In order to alleviate the issues above， a neural machine translation model based on source language syntax enhanced decoding was proposed to explicitly use the source language syntax to guide the encoding， namely SSED （Source language Syntax Enhanced Decoding）. A syntax?aware mask mechanism based on the syntactic information of the source sentence was constructed at first， and an additional syntax?dependent representation was generated by guiding the encoding self?attention. Then the syntax?dependent representation was used as a supplement to the representation of the original sentence and the decoding process was integrated by attention mechanism， which jointly guided the generation of the target language， realizing the enhancement of the prior syntax. Experimental results on several standard IWSLT （International Conference on Spoken Language Translation） and WMT （Conference on Machine Translation） machine translation evaluation task test sets show that compared with the baseline model Transformer， the proposed method obtains a BLEU score improvement of 0.84 to 3.41 respectively， achieving the state?of?the?art results of the syntactic related research. The fusion of syntactic information and self?attention mechanism is effective， the use of source language syntax can guide the decoding process of the neural machine translation system and significantly improve the quality of translation.

Key words: Natural Language Processing (NLP), neural machine translation, syntactic information, Transformer, enhanced decoding, external knowledge incorporation

摘要：

当前性能最优的机器翻译模型之一Transformer基于标准的端到端结构，仅依赖于平行句对，默认模型能够自动学习语料中的知识；但这种建模方式缺乏显式的引导，不能有效挖掘深层语言知识，特别是在语料规模和质量受限的低资源环境下，句子解码缺乏先验约束，从而造成译文质量下降。为了缓解上述问题，提出了基于源语言句法增强解码的神经机器翻译（SSED）方法，显式地引入源语句句法信息指导解码。所提方法首先利用源语句句法信息构造句法感知的遮挡机制，引导编码自注意力生成一个额外的句法相关表征；然后将句法相关表征作为原句表征的补充，通过注意力机制融入解码，共同指导目标语言的生成，实现对模型的先验句法增强。在多个IWSLT及WMT标准机器翻译评测任务测试集上的实验结果显示，与Transformer基线模型相比，所提方法的BLEU值提高了0.84~3.41，达到了句法相关研究的最先进水平。句法信息与自注意力机制融合是有效的，利用源语言句法可指导神经机器翻译系统的解码过程，显著提高译文质量。

关键词: 自然语言处理, 神经机器翻译, 句法信息, Transformer, 增强解码, 外部知识融入

CLC Number:

TP391.1

Longchao GONG, Junjun GUO, Zhengtao YU. Neural machine translation method based on source language syntax enhanced decoding[J]. Journal of Computer Applications, 2022, 42(11): 3386-3394.

龚龙超, 郭军军, 余正涛. 基于源语言句法增强解码的神经机器翻译方法[J]. 《计算机应用》唯一官方网站, 2022, 42(11): 3386-3394.

Figures/Tables 9

References 40

1	SUTSKEVER I， VINYALS O， LE Q V. Sequence to sequence learning with neural networks［C］// Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge： MIT Press， 2014： 3104-3112
2	BAHDANAU D， CHO K， BENGIO Y. Neural machine translation by jointly learning to align and translate［EB/OL］. （2016-05-19）［2021-08-11］.. 10.1017/9781108608480.003
3	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017：6000-6010.
4	李亚超，熊德意，张民. 神经机器翻译综述［J］. 计算机学报， 2018， 41（12）： 2734-2755. 10.11897/SP.J.1016.2018.02734
	LI Y C， XIONG D Y， ZHANG M. A survey of neural machine translation［J］. Chinese Journal of Computers， 2018， 41（12）： 2734-2755. 10.11897/SP.J.1016.2018.02734
5	刘洋. 神经机器翻译前沿进展［J］. 计算机研究与发展， 2017， 54（6）： 1144. 10.7544/issn1000-1239.2017.20160805
	LIU Y. Recent advances in neural machine translation［J］. Journal of Computer Research and Development， 2017， 54（6）： 1144. 10.7544/issn1000-1239.2017.20160805
6	GEHRING J， AULI M， GRANGIER D， et al. Convolutional sequence to sequence learning［C］// Proceedings of the 34th International Conference on Machine Learning. New York： JMLR.org， 2017： 1243-1252. 10.18653/v1/p17-1012
7	LUONG M T， PHAM H， MANNIN C D. Effective approaches to attention‑based neural machine translation［C］// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： Association for Computational Linguistics， 2015：1412-1421. 10.18653/v1/d15-1166
8	ERIGUCHI A， HASHIMOTO K， TSURUOKA Y. Tree‑to‑sequence attentional neural machine translation［C］// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg， PA： Association for Computational Linguistics， 2016： 823-833. 10.18653/v1/p16-1078
9	NGUYAN X P， JOTY S， HOI S C H， et al. Tree‑structured attention with hierarchical accumulation［EB/OL］. （2020-02-19）［2021-08-11］..
10	SAUNDERS D， STAHLBERG F， DE GISPERT A， et al. Multi‑representation ensembles and delayed SGD updates improve syntax based NMT［C］// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics （Volume 2： Short Papers）. Stroudsburg， PA： Association for Computational Linguistics， 2018： 319-325. 10.18653/v1/p18-2051
11	ZHANG M S， LI Z H， FU G H， et al. Syntax‑enhanced neural machine translation with syntax‑aware word representations［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. Stroudsburg， PA： Association for Computational Linguistics， 2019： 1151-1161. 10.18653/v1/n19-1118
12	BUGLIARELLO E， OKAZAKI N. Enhancing machine translation with dependency‑aware self‑attention［C］// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg， PA： Association for Computational Linguistics， 2020： 1618-1627. 10.18653/v1/2020.acl-main.147
13	WU S Z， ZHANG D D， ZHANG Z R， et al. Dependency‑to‑ dependency neural machine translation［J］. IEEE/ACM Transactions on Audio， Speech， and Language Processing， 2018， 26（11）： 2132-2141. 10.1109/taslp.2018.2855968
14	CURREY A， HEAFIELD K. Incorporating source syntax into transformer‑based neural machine translation［C］// Proceedings of the 4th Conference on Machine Translation （Volume 1： Research Papers）. Stroudsburg， PA： Association for Computational Linguistics， 2019： 24-33. 10.18653/v1/w19-5203
15	SENNRICH R， HADDOW B. Linguistic input features improve neural machine translation［C］// Proceedings of the 1st Conference on Machine Translation： Volume 1， Research Papers. Stroudsburg， PA： Association for Computational Linguistics， 2016： 83-91. 10.18653/v1/w16-2209
16	ERIGUCHI A， TSURUOKA Y， CHO K. Learning to parse and translate improves neural machine translation［C］// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics （Volume 2： Short Papers）. Stroudsburg， PA： Association for Computational Linguistics， 2017： 72-78. 10.18653/v1/p17-2012
17	DYER C， KUNCORO A， BALLESTEROS M， et al. Recurrent neural network grammars［C］// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies. Stroudsburg， PA： Association for Computational Linguistics， 2016： 199-209. 10.18653/v1/n16-1024
18	CHEN H D， HUANG S J， CHIANG D， et al. Improved neural machine translation with a syntax‑aware encoder and decoder［C］// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg， PA： Association for Computational Linguistics， 2017： 1936-1945. 10.18653/v1/p17-1177
19	CHEN K H， WANG R， UTIYAMA M， et al. Syntax‑directed attention for neural machine translation［C］// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2018： 4792-4799. 10.1609/aaai.v32i1.11910
20	ANASTASOPOULOS A， CHIANG D. Tied multitask learning for neural speech translation［C］// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long Papers）. Stroudsburg， PA： Association for Computational Linguistics， 2018： 82-91. 10.18653/v1/n18-1008
21	PETERS M E， NEUMANN M， IYYER M， et al. Deep contextualized word representations［C］// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long Papers）. Stroudsburg， PA： Association for Computational Linguistics， 2018： 2227-2237. 10.18653/v1/n18-1202
22	RAGANATO A， TIEDEMANN J. An analysis of encoder representations in transformer‑based machine translation［C］// Proceedings of the 2018 EMNLP Workshop BlackboxNLP： Analyzing and Interpreting Neural Networks for NLP. Stroudsburg， PA： Association for Computational Linguistics， 2018： 287-297. 10.18653/v1/w18-5431
23	EDUNOV S， OTT M， AULI M， et al. Classical structured prediction losses for sequence to sequence learning［C］// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long Papers）. Stroudsburg， PA： Association for Computational Linguistics， 2018： 355-364. 10.18653/v1/n18-1033
24	SENNRICH R， HADDOW B， BIRCH A. Neural machine translation of rare words with subword units［C］// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg， PA： Association for Computational Linguistics， 2015： 1715-1725. 10.18653/v1/p16-1162
25	MANNING C D， SURDEANU M， BAUER J， et al. The Stanford CoreNLP natural language processing toolkit［C］// Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics： System Demonstrations. Stroudsburg， PA： Association for Computational Linguistics， 2014： 55-60. 10.3115/v1/p14-5010
26	OTT M， EDUNOV S， BAEVSKI A， et al. FAIRSEQ： a fast， extensible toolkit for sequence modeling［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies （Demonstrations）. Stroudsburg， PA： Association for Computational Linguistics， 2019： 48-53. 10.18653/v1/n19-4009
27	STRUBELL E， VERGA P， ANDOR D， et al. Linguistically‑ informed self‑attention for semantic role labeling［C］// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： Association for Computational Linguistics， 2018： 5027-5038. 10.18653/v1/d18-1548
28	CLARK K， LUONG M T， MANNING C D， et al. Semi‑supervised sequence modeling with cross‑view training［C］// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： Association for Computational Linguistics， 2018： 1914-1925. 10.18653/v1/d18-1217
29	WU F， FAN A， BAEVSKI A， et al. Pay less attention with lightweight and dynamic convolutions［EB/OL］. （2019-02-22）［2021-08-11］.. 10.48550/arXiv.1901.10430
30	XIA Y C， HE T Y， TAN X， et al. Tied Transformers： neural machine translation with shared encoder and decoder［C］// Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2019： 5466-5473. 10.1609/aaai.v33i01.33015466
31	LU Y P， LI Z H， HE D， et al. Understanding and improving Transformer from a multi‑particle dynamic system point of view［EB/OL］. （2019-06-06）［2021-08-11］..
32	CHEN Y C， GAN Z， CHENG Y， et al. Distilling knowledge learned in BERT for text generation［C］// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg， PA： Association for Computational Linguistics， 2019： 7893-7905. 10.18653/v1/2020.acl-main.705
33	ZHU J H， XIA Y C， WU L J， et al. Incorporating BERT into neural machine translation［EB/OL］. （2020-02-17）［2021-08-18］..
34	TU Z P， LIU Y， SHI S M， et al. Learning to remember translation history with a continuous cache［J］. Transactions of the Association for Computational Linguistics， 2018， 6： 407-420. 10.1162/tacl_a_00029
35	ZHANG J C， LUAN H B， SUN M S， et al. Improving the Transformer translation model with document‑level context［C］// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： Association for Computational Linguistics， 2018： 533-542. 10.18653/v1/d18-1049
36	MSRUF S， MARTINS A F T， HAFFARI G. Selective attention for context‑aware neural machine translation［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. Stroudsburg， PA： Association for Computational Linguistics， 2019： 3092-3102.
37	CAO Q， XIONG D Y. Encoding gated translation memory into neural machine translation［C］// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： Association for Computational Linguistics， 2018： 3042-3047. 10.18653/v1/d18-1340
38	KUANG S H， XIONG D Y. Fusing recency into neural machine translation with an inter‑sentence gate model［C］// Proceedings of the 27th International Conference on Computational Linguistics. Stroudsburg， PA： Association for Computational Linguistics， 2018： 607-617. 10.18653/v1/p18-1164
39	STOJANOVSKI D， FRASER A. Coreference and coherence in neural machine translation： a study using oracle experiments［C］// Proceedings of the 3rd Conference on Machine Translation： Research Papers. Stroudsburg， PA： Association for Computational Linguistics， 2018： 49-60. 10.18653/v1/w18-6306
40	VOITA E， SERDYUKOV P， SENNRICH R， et al. Context‑aware neural machine translation learns anaphora resolution［C］// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg， PA： Association for Computational Linguistics， 2018： 1264-1274. 10.18653/v1/p18-1117

集合	NC11	WMT18	IWSLT14	IWSLT15
训练集	226 000	207 000	160 000	133 000
验证集	2 169	3 000	7 283	1 553
测试集	2 999	3 007	6 750	1 268

集合	NC11	WMT18	IWSLT14	IWSLT15
训练集	226 000	207 000	160 000	133 000
验证集	2 169	3 000	7 283	1 553
测试集	2 999	3 007	6 750	1 268

模型	NC11		WMT18	模型	IWSLT14德英		IWSLT15英越
模型	英德	德英	英土	模型	valid	test	tst2012	tst2013
Mixed Enc.	—	—	9.60	ELMo	—	—	—	29.30
Multi‑Task	—	—	10.60	CVT	—	—	—	29.60
Transformer	25.00	26.60	13.10	SAWR	—	—	—	29.09
+Multi‑Task	24.80	26.70	14.00	C‑MLM	36.93	35.63	27.85	31.51
+S&H	25.50	26.80	13.00	Transformer	35.27	34.09	27.03	30.76
+LISA	25.30	27.10	13.60	Tied‑Transform	—	35.52	—	—
+PASCAL	25.90	27.40	14.00	Dynamic Conv	—	35.20	—	—
SSED	25.97	28.44	16.51	Macaron	—	35.40	—	—
				BERT‑fused	—	36.11	—	—
				SSED	36.85	35.53	27.95	31.60

模型	NC11		WMT18	模型	IWSLT14德英		IWSLT15英越
模型	英德	德英	英土	模型	valid	test	tst2012	tst2013
Mixed Enc.	—	—	9.60	ELMo	—	—	—	29.30
Multi‑Task	—	—	10.60	CVT	—	—	—	29.60
Transformer	25.00	26.60	13.10	SAWR	—	—	—	29.09
+Multi‑Task	24.80	26.70	14.00	C‑MLM	36.93	35.63	27.85	31.51
+S&H	25.50	26.80	13.00	Transformer	35.27	34.09	27.03	30.76
+LISA	25.30	27.10	13.60	Tied‑Transform	—	35.52	—	—
+PASCAL	25.90	27.40	14.00	Dynamic Conv	—	35.20	—	—
SSED	25.97	28.44	16.51	Macaron	—	35.40	—	—
				BERT‑fused	—	36.11	—	—
				SSED	36.85	35.53	27.95	31.60

方法	tst2012	tst2013	方法	tst2012	tst2013
Transformer	27.03	30.76	Syn+Enc	26.11	28.73
Enc+Syn	24.87	28.04	Enc//Syn	27.10	31.09

Neural machine translation method based on source language syntax enhanced decoding

基于源语言句法增强解码的神经机器翻译方法

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 9

References 40

Related Articles 15

Recommended Articles

Metrics

层	tst2012	tst2013	层	tst2012	tst2013
Base	27.03	30.76	Gate5	27.77	31.49
1	27.24	31.11	1-6	27.10	31.09
2	27.32	31.20	1-2	27.16	31.13
3	27.37	31.00	1-3	27.37	31.01
4	28.07	31.10	1-4	26.72	31.07
5	27.95	31.60	4-6	26.67	30.78
6	27.40	31.47	5-6	27.16	31.10
Enc5	27.86	31.24

示例	源语句	参考译文	原译文	句法增强
例句1	dies ist mein supermarkt. kein großer.	this is my supermarket. not such a big one.	this is my supermarket. this is not a big deal.	this is my supermarket. it is not a big one.
例句2	ich habe ihn hier auf meinem laptop.	i have got it here on my laptop.	i have it here on my laptop.	i have got it here on my laptop.

[1]	Liehong REN, Lyuwen HUANG, Xu TIAN, Fei DUAN. Multivariate long-term series forecasting method with DFT-based frequency-sensitive dual-branch Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2739-2746.
[2]	Qi SHUAI, Hairui WANG, Guifu ZHU. Chinese story ending generation model based on bidirectional contrastive training [J]. Journal of Computer Applications, 2024, 44(9): 2683-2688.
[3]	Jinjin LI, Guoming SANG, Yijia ZHANG. Multi-domain fake news detection model enhanced by APK-CNN and Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2674-2682.
[4]	Jiepo FANG, Chongben TAO. Hybrid internet of vehicles intrusion detection system for zero-day attacks [J]. Journal of Computer Applications, 2024, 44(9): 2763-2769.
[5]	Jieru JIA, Jianchao YANG, Shuorui ZHANG, Tao YAN, Bin CHEN. Unsupervised person re-identification based on self-distilled vision Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2893-2902.
[6]	Yunchuan HUANG, Yongquan JIANG, Juntao HUANG, Yan YANG. Molecular toxicity prediction based on meta graph isomorphism network [J]. Journal of Computer Applications, 2024, 44(9): 2964-2969.
[7]	Xin YANG, Xueni CHEN, Chunjiang WU, Shijie ZHOU. Short-term traffic flow prediction of urban highway based on variant residual model and Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2947-2951.
[8]	Quanmei ZHANG, Runping HUANG, Fei TENG, Haibo ZHANG, Nan ZHOU. Automatic international classification of disease coding method incorporating heterogeneous information [J]. Journal of Computer Applications, 2024, 44(8): 2476-2482.
[9]	Yuwei DING, Hongbo SHI, Jie LI, Min LIANG. Image denoising network based on local and global feature decoupling [J]. Journal of Computer Applications, 2024, 44(8): 2571-2579.
[10]	Kaili DENG, Weibo WEI, Zhenkuan PAN. Industrial defect detection method with improved masked autoencoder [J]. Journal of Computer Applications, 2024, 44(8): 2595-2603.
[11]	Fan YANG, Yao ZOU, Mingzhi ZHU, Zhenwei MA, Dawei CHENG, Changjun JIANG. Credit card fraud detection model based on graph attention Transformation neural network [J]. Journal of Computer Applications, 2024, 44(8): 2634-2642.
[12]	Dahai LI, Zhonghua WANG, Zhendong WANG. Dual-branch low-light image enhancement network combining spatial and frequency domain information [J]. Journal of Computer Applications, 2024, 44(7): 2175-2182.
[13]	Xiting LYU, Jinghua ZHAO, Haiying RONG, Jiale ZHAO. Information diffusion prediction model based on Transformer and relational graph convolutional network [J]. Journal of Computer Applications, 2024, 44(6): 1760-1766.
[14]	Xun YAO, Zhongzheng QIN, Jie YANG. Generative label adversarial text classification model [J]. Journal of Computer Applications, 2024, 44(6): 1781-1785.
[15]	Shibin LI, Jun GONG, Shengjun TANG. Semi-supervised heterophilic graph representation learning model based on Graph Transformer [J]. Journal of Computer Applications, 2024, 44(6): 1816-1823.