Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (11): 3386-3394.DOI: 10.11772/j.issn.1001-9081.2021111963
Special Issue: 第九届CCF大数据学术会议(CCF Bigdata 2021)
• CCF Bigdata 2021 • Previous Articles Next Articles
Longchao GONG1,2, Junjun GUO1,2(), Zhengtao YU1,2
Received:
2021-11-19
Revised:
2021-11-25
Accepted:
2021-12-06
Online:
2021-12-31
Published:
2022-11-10
Contact:
Junjun GUO
About author:
GONG Longchao, born in 1997, M. S. candidate. His research interests include natural language processing, machine translation.Supported by:
通讯作者:
郭军军
作者简介:
龚龙超(1997—),男,河南南阳人 ,硕士研究生,CCF会员,主要研究方向:自然语言处理、机器翻译基金资助:
CLC Number:
Longchao GONG, Junjun GUO, Zhengtao YU. Neural machine translation method based on source language syntax enhanced decoding[J]. Journal of Computer Applications, 2022, 42(11): 3386-3394.
龚龙超, 郭军军, 余正涛. 基于源语言句法增强解码的神经机器翻译方法[J]. 《计算机应用》唯一官方网站, 2022, 42(11): 3386-3394.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2021111963
集合 | NC11 | WMT18 | IWSLT14 | IWSLT15 |
---|---|---|---|---|
训练集 | 226 000 | 207 000 | 160 000 | 133 000 |
验证集 | 2 169 | 3 000 | 7 283 | 1 553 |
测试集 | 2 999 | 3 007 | 6 750 | 1 268 |
Tab. 1 Statistics of corpus size in experiments
集合 | NC11 | WMT18 | IWSLT14 | IWSLT15 |
---|---|---|---|---|
训练集 | 226 000 | 207 000 | 160 000 | 133 000 |
验证集 | 2 169 | 3 000 | 7 283 | 1 553 |
测试集 | 2 999 | 3 007 | 6 750 | 1 268 |
模型 | NC11 | WMT18 | 模型 | IWSLT14德英 | IWSLT15英越 | |||
---|---|---|---|---|---|---|---|---|
英德 | 德英 | 英土 | valid | test | tst2012 | tst2013 | ||
Mixed Enc. | — | — | 9.60 | ELMo | — | — | — | 29.30 |
Multi‑Task | — | — | 10.60 | CVT | — | — | — | 29.60 |
Transformer | 25.00 | 26.60 | 13.10 | SAWR | — | — | — | 29.09 |
+Multi‑Task | 24.80 | 26.70 | 14.00 | C‑MLM | 36.93 | 35.63 | 27.85 | 31.51 |
+S&H | 25.50 | 26.80 | 13.00 | Transformer | 35.27 | 34.09 | 27.03 | 30.76 |
+LISA | 25.30 | 27.10 | 13.60 | Tied‑Transform | — | 35.52 | — | — |
+PASCAL | 25.90 | 27.40 | 14.00 | Dynamic Conv | — | 35.20 | — | — |
SSED | 25.97 | 28.44 | 16.51 | Macaron | — | 35.40 | — | — |
BERT‑fused | — | 36.11 | — | — | ||||
SSED | 36.85 | 35.53 | 27.95 | 31.60 |
Tab. 2 BLEU values of different machine translation methods on various datasets
模型 | NC11 | WMT18 | 模型 | IWSLT14德英 | IWSLT15英越 | |||
---|---|---|---|---|---|---|---|---|
英德 | 德英 | 英土 | valid | test | tst2012 | tst2013 | ||
Mixed Enc. | — | — | 9.60 | ELMo | — | — | — | 29.30 |
Multi‑Task | — | — | 10.60 | CVT | — | — | — | 29.60 |
Transformer | 25.00 | 26.60 | 13.10 | SAWR | — | — | — | 29.09 |
+Multi‑Task | 24.80 | 26.70 | 14.00 | C‑MLM | 36.93 | 35.63 | 27.85 | 31.51 |
+S&H | 25.50 | 26.80 | 13.00 | Transformer | 35.27 | 34.09 | 27.03 | 30.76 |
+LISA | 25.30 | 27.10 | 13.60 | Tied‑Transform | — | 35.52 | — | — |
+PASCAL | 25.90 | 27.40 | 14.00 | Dynamic Conv | — | 35.20 | — | — |
SSED | 25.97 | 28.44 | 16.51 | Macaron | — | 35.40 | — | — |
BERT‑fused | — | 36.11 | — | — | ||||
SSED | 36.85 | 35.53 | 27.95 | 31.60 |
方法 | tst2012 | tst2013 | 方法 | tst2012 | tst2013 |
---|---|---|---|---|---|
Transformer | 27.03 | 30.76 | Syn+Enc | 26.11 | 28.73 |
Enc+Syn | 24.87 | 28.04 | Enc//Syn | 27.10 | 31.09 |
Tab. 3 BLEU values of different fusion methods on IWSLT15 English?Vietnamese tasks
方法 | tst2012 | tst2013 | 方法 | tst2012 | tst2013 |
---|---|---|---|---|---|
Transformer | 27.03 | 30.76 | Syn+Enc | 26.11 | 28.73 |
Enc+Syn | 24.87 | 28.04 | Enc//Syn | 27.10 | 31.09 |
方法 | tst2012 | tst2013 | 方法 | tst2012 | tst2013 |
---|---|---|---|---|---|
Transformer | 27.03 | 30.76 | Highway | 26.19 | 29.82 |
Average | 26.37 | 29.91 | Linear | 27.10 | 31.09 |
Gate | 26.79 | 31.09 |
Tab. 4 BLEU values of different integration methods on IWSLT15 English?Vietnamese tasks
方法 | tst2012 | tst2013 | 方法 | tst2012 | tst2013 |
---|---|---|---|---|---|
Transformer | 27.03 | 30.76 | Highway | 26.19 | 29.82 |
Average | 26.37 | 29.91 | Linear | 27.10 | 31.09 |
Gate | 26.79 | 31.09 |
层 | tst2012 | tst2013 | 层 | tst2012 | tst2013 |
---|---|---|---|---|---|
Base | 27.03 | 30.76 | Gate5 | 27.77 | 31.49 |
1 | 27.24 | 31.11 | 1-6 | 27.10 | 31.09 |
2 | 27.32 | 31.20 | 1-2 | 27.16 | 31.13 |
3 | 27.37 | 31.00 | 1-3 | 27.37 | 31.01 |
4 | 28.07 | 31.10 | 1-4 | 26.72 | 31.07 |
5 | 27.95 | 31.60 | 4-6 | 26.67 | 30.78 |
6 | 27.40 | 31.47 | 5-6 | 27.16 | 31.10 |
Enc5 | 27.86 | 31.24 |
Tab. 5 BLEU values of introducing syntactic information to different decoding layers on IWSLT15 English?Vietnamese tasks
层 | tst2012 | tst2013 | 层 | tst2012 | tst2013 |
---|---|---|---|---|---|
Base | 27.03 | 30.76 | Gate5 | 27.77 | 31.49 |
1 | 27.24 | 31.11 | 1-6 | 27.10 | 31.09 |
2 | 27.32 | 31.20 | 1-2 | 27.16 | 31.13 |
3 | 27.37 | 31.00 | 1-3 | 27.37 | 31.01 |
4 | 28.07 | 31.10 | 1-4 | 26.72 | 31.07 |
5 | 27.95 | 31.60 | 4-6 | 26.67 | 30.78 |
6 | 27.40 | 31.47 | 5-6 | 27.16 | 31.10 |
Enc5 | 27.86 | 31.24 |
示例 | 源语句 | 参考译文 | 原译文 | 句法增强 |
---|---|---|---|---|
例句1 | dies ist mein supermarkt. kein großer. | this is my supermarket. not such a big one. | this is my supermarket. this is not a big deal. | this is my supermarket. it is not a big one. |
例句2 | ich habe ihn hier auf meinem laptop. | i have got it here on my laptop. | i have it here on my laptop. | i have got it here on my laptop. |
Tab. 6 Examples of comparison between original machine translation results and syntax?enhanced machine translation results
示例 | 源语句 | 参考译文 | 原译文 | 句法增强 |
---|---|---|---|---|
例句1 | dies ist mein supermarkt. kein großer. | this is my supermarket. not such a big one. | this is my supermarket. this is not a big deal. | this is my supermarket. it is not a big one. |
例句2 | ich habe ihn hier auf meinem laptop. | i have got it here on my laptop. | i have it here on my laptop. | i have got it here on my laptop. |
1 | SUTSKEVER I, VINYALS O, LE Q V. Sequence to sequence learning with neural networks[C]// Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2014: 3104-3112 |
2 | BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[EB/OL]. (2016-05-19) [2021-08-11].. 10.1017/9781108608480.003 |
3 | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2017:6000-6010. |
4 | 李亚超,熊德意,张民. 神经机器翻译综述[J]. 计算机学报, 2018, 41(12): 2734-2755. 10.11897/SP.J.1016.2018.02734 |
LI Y C, XIONG D Y, ZHANG M. A survey of neural machine translation[J]. Chinese Journal of Computers, 2018, 41(12): 2734-2755. 10.11897/SP.J.1016.2018.02734 | |
5 | 刘洋. 神经机器翻译前沿进展[J]. 计算机研究与发展, 2017, 54(6): 1144. 10.7544/issn1000-1239.2017.20160805 |
LIU Y. Recent advances in neural machine translation[J]. Journal of Computer Research and Development, 2017, 54(6): 1144. 10.7544/issn1000-1239.2017.20160805 | |
6 | GEHRING J, AULI M, GRANGIER D, et al. Convolutional sequence to sequence learning[C]// Proceedings of the 34th International Conference on Machine Learning. New York: JMLR.org, 2017: 1243-1252. 10.18653/v1/p17-1012 |
7 | LUONG M T, PHAM H, MANNIN C D. Effective approaches to attention‑based neural machine translation[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2015:1412-1421. 10.18653/v1/d15-1166 |
8 | ERIGUCHI A, HASHIMOTO K, TSURUOKA Y. Tree‑to‑sequence attentional neural machine translation[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2016: 823-833. 10.18653/v1/p16-1078 |
9 | NGUYAN X P, JOTY S, HOI S C H, et al. Tree‑structured attention with hierarchical accumulation[EB/OL]. (2020-02-19) [2021-08-11].. |
10 | SAUNDERS D, STAHLBERG F, DE GISPERT A, et al. Multi‑representation ensembles and delayed SGD updates improve syntax based NMT[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2018: 319-325. 10.18653/v1/p18-2051 |
11 | ZHANG M S, LI Z H, FU G H, et al. Syntax‑enhanced neural machine translation with syntax‑aware word representations[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2019: 1151-1161. 10.18653/v1/n19-1118 |
12 | BUGLIARELLO E, OKAZAKI N. Enhancing machine translation with dependency‑aware self‑attention[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2020: 1618-1627. 10.18653/v1/2020.acl-main.147 |
13 | WU S Z, ZHANG D D, ZHANG Z R, et al. Dependency‑to‑ dependency neural machine translation[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2018, 26(11): 2132-2141. 10.1109/taslp.2018.2855968 |
14 | CURREY A, HEAFIELD K. Incorporating source syntax into transformer‑based neural machine translation[C]// Proceedings of the 4th Conference on Machine Translation (Volume 1: Research Papers). Stroudsburg, PA: Association for Computational Linguistics, 2019: 24-33. 10.18653/v1/w19-5203 |
15 | SENNRICH R, HADDOW B. Linguistic input features improve neural machine translation[C]// Proceedings of the 1st Conference on Machine Translation: Volume 1, Research Papers. Stroudsburg, PA: Association for Computational Linguistics, 2016: 83-91. 10.18653/v1/w16-2209 |
16 | ERIGUCHI A, TSURUOKA Y, CHO K. Learning to parse and translate improves neural machine translation[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2017: 72-78. 10.18653/v1/p17-2012 |
17 | DYER C, KUNCORO A, BALLESTEROS M, et al. Recurrent neural network grammars[C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: Association for Computational Linguistics, 2016: 199-209. 10.18653/v1/n16-1024 |
18 | CHEN H D, HUANG S J, CHIANG D, et al. Improved neural machine translation with a syntax‑aware encoder and decoder[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2017: 1936-1945. 10.18653/v1/p17-1177 |
19 | CHEN K H, WANG R, UTIYAMA M, et al. Syntax‑directed attention for neural machine translation[C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2018: 4792-4799. 10.1609/aaai.v32i1.11910 |
20 | ANASTASOPOULOS A, CHIANG D. Tied multitask learning for neural speech translation[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2018: 82-91. 10.18653/v1/n18-1008 |
21 | PETERS M E, NEUMANN M, IYYER M, et al. Deep contextualized word representations[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2018: 2227-2237. 10.18653/v1/n18-1202 |
22 | RAGANATO A, TIEDEMANN J. An analysis of encoder representations in transformer‑based machine translation[C]// Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Stroudsburg, PA: Association for Computational Linguistics, 2018: 287-297. 10.18653/v1/w18-5431 |
23 | EDUNOV S, OTT M, AULI M, et al. Classical structured prediction losses for sequence to sequence learning[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2018: 355-364. 10.18653/v1/n18-1033 |
24 | SENNRICH R, HADDOW B, BIRCH A. Neural machine translation of rare words with subword units[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2015: 1715-1725. 10.18653/v1/p16-1162 |
25 | MANNING C D, SURDEANU M, BAUER J, et al. The Stanford CoreNLP natural language processing toolkit[C]// Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Stroudsburg, PA: Association for Computational Linguistics, 2014: 55-60. 10.3115/v1/p14-5010 |
26 | OTT M, EDUNOV S, BAEVSKI A, et al. FAIRSEQ: a fast, extensible toolkit for sequence modeling[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Demonstrations). Stroudsburg, PA: Association for Computational Linguistics, 2019: 48-53. 10.18653/v1/n19-4009 |
27 | STRUBELL E, VERGA P, ANDOR D, et al. Linguistically‑ informed self‑attention for semantic role labeling[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2018: 5027-5038. 10.18653/v1/d18-1548 |
28 | CLARK K, LUONG M T, MANNING C D, et al. Semi‑supervised sequence modeling with cross‑view training[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2018: 1914-1925. 10.18653/v1/d18-1217 |
29 | WU F, FAN A, BAEVSKI A, et al. Pay less attention with lightweight and dynamic convolutions[EB/OL]. (2019-02-22) [2021-08-11].. 10.48550/arXiv.1901.10430 |
30 | XIA Y C, HE T Y, TAN X, et al. Tied Transformers: neural machine translation with shared encoder and decoder[C]// Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2019: 5466-5473. 10.1609/aaai.v33i01.33015466 |
31 | LU Y P, LI Z H, HE D, et al. Understanding and improving Transformer from a multi‑particle dynamic system point of view[EB/OL]. (2019-06-06) [2021-08-11].. |
32 | CHEN Y C, GAN Z, CHENG Y, et al. Distilling knowledge learned in BERT for text generation[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2019: 7893-7905. 10.18653/v1/2020.acl-main.705 |
33 | ZHU J H, XIA Y C, WU L J, et al. Incorporating BERT into neural machine translation[EB/OL]. (2020-02-17) [2021-08-18].. |
34 | TU Z P, LIU Y, SHI S M, et al. Learning to remember translation history with a continuous cache[J]. Transactions of the Association for Computational Linguistics, 2018, 6: 407-420. 10.1162/tacl_a_00029 |
35 | ZHANG J C, LUAN H B, SUN M S, et al. Improving the Transformer translation model with document‑level context[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2018: 533-542. 10.18653/v1/d18-1049 |
36 | MSRUF S, MARTINS A F T, HAFFARI G. Selective attention for context‑aware neural machine translation[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2019: 3092-3102. |
37 | CAO Q, XIONG D Y. Encoding gated translation memory into neural machine translation[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2018: 3042-3047. 10.18653/v1/d18-1340 |
38 | KUANG S H, XIONG D Y. Fusing recency into neural machine translation with an inter‑sentence gate model[C]// Proceedings of the 27th International Conference on Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2018: 607-617. 10.18653/v1/p18-1164 |
39 | STOJANOVSKI D, FRASER A. Coreference and coherence in neural machine translation: a study using oracle experiments[C]// Proceedings of the 3rd Conference on Machine Translation: Research Papers. Stroudsburg, PA: Association for Computational Linguistics, 2018: 49-60. 10.18653/v1/w18-6306 |
40 | VOITA E, SERDYUKOV P, SENNRICH R, et al. Context‑aware neural machine translation learns anaphora resolution[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2018: 1264-1274. 10.18653/v1/p18-1117 |
[1] | Liehong REN, Lyuwen HUANG, Xu TIAN, Fei DUAN. Multivariate long-term series forecasting method with DFT-based frequency-sensitive dual-branch Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2739-2746. |
[2] | Qi SHUAI, Hairui WANG, Guifu ZHU. Chinese story ending generation model based on bidirectional contrastive training [J]. Journal of Computer Applications, 2024, 44(9): 2683-2688. |
[3] | Jinjin LI, Guoming SANG, Yijia ZHANG. Multi-domain fake news detection model enhanced by APK-CNN and Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2674-2682. |
[4] | Jiepo FANG, Chongben TAO. Hybrid internet of vehicles intrusion detection system for zero-day attacks [J]. Journal of Computer Applications, 2024, 44(9): 2763-2769. |
[5] | Jieru JIA, Jianchao YANG, Shuorui ZHANG, Tao YAN, Bin CHEN. Unsupervised person re-identification based on self-distilled vision Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2893-2902. |
[6] | Yunchuan HUANG, Yongquan JIANG, Juntao HUANG, Yan YANG. Molecular toxicity prediction based on meta graph isomorphism network [J]. Journal of Computer Applications, 2024, 44(9): 2964-2969. |
[7] | Xin YANG, Xueni CHEN, Chunjiang WU, Shijie ZHOU. Short-term traffic flow prediction of urban highway based on variant residual model and Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2947-2951. |
[8] | Quanmei ZHANG, Runping HUANG, Fei TENG, Haibo ZHANG, Nan ZHOU. Automatic international classification of disease coding method incorporating heterogeneous information [J]. Journal of Computer Applications, 2024, 44(8): 2476-2482. |
[9] | Yuwei DING, Hongbo SHI, Jie LI, Min LIANG. Image denoising network based on local and global feature decoupling [J]. Journal of Computer Applications, 2024, 44(8): 2571-2579. |
[10] | Kaili DENG, Weibo WEI, Zhenkuan PAN. Industrial defect detection method with improved masked autoencoder [J]. Journal of Computer Applications, 2024, 44(8): 2595-2603. |
[11] | Fan YANG, Yao ZOU, Mingzhi ZHU, Zhenwei MA, Dawei CHENG, Changjun JIANG. Credit card fraud detection model based on graph attention Transformation neural network [J]. Journal of Computer Applications, 2024, 44(8): 2634-2642. |
[12] | Dahai LI, Zhonghua WANG, Zhendong WANG. Dual-branch low-light image enhancement network combining spatial and frequency domain information [J]. Journal of Computer Applications, 2024, 44(7): 2175-2182. |
[13] | Xiting LYU, Jinghua ZHAO, Haiying RONG, Jiale ZHAO. Information diffusion prediction model based on Transformer and relational graph convolutional network [J]. Journal of Computer Applications, 2024, 44(6): 1760-1766. |
[14] | Xun YAO, Zhongzheng QIN, Jie YANG. Generative label adversarial text classification model [J]. Journal of Computer Applications, 2024, 44(6): 1781-1785. |
[15] | Shibin LI, Jun GONG, Shengjun TANG. Semi-supervised heterophilic graph representation learning model based on Graph Transformer [J]. Journal of Computer Applications, 2024, 44(6): 1816-1823. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||