Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (11): 3386-3394.DOI: 10.11772/j.issn.1001-9081.2021111963
• CCF Bigdata 2021 • Previous Articles
Longchao GONG1,2, Junjun GUO1,2(), Zhengtao YU1,2
Received:
2021-11-19
Revised:
2021-11-25
Accepted:
2021-12-06
Online:
2021-12-31
Published:
2022-11-10
Contact:
Junjun GUO
About author:
GONG Longchao, born in 1997, M. S. candidate. His research interests include natural language processing, machine translation.Supported by:
通讯作者:
郭军军
作者简介:
龚龙超(1997—),男,河南南阳人 ,硕士研究生,CCF会员,主要研究方向:自然语言处理、机器翻译基金资助:
CLC Number:
Longchao GONG, Junjun GUO, Zhengtao YU. Neural machine translation method based on source language syntax enhanced decoding[J]. Journal of Computer Applications, 2022, 42(11): 3386-3394.
龚龙超, 郭军军, 余正涛. 基于源语言句法增强解码的神经机器翻译方法[J]. 《计算机应用》唯一官方网站, 2022, 42(11): 3386-3394.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.joca.cn/EN/10.11772/j.issn.1001-9081.2021111963
集合 | NC11 | WMT18 | IWSLT14 | IWSLT15 |
---|---|---|---|---|
训练集 | 226 000 | 207 000 | 160 000 | 133 000 |
验证集 | 2 169 | 3 000 | 7 283 | 1 553 |
测试集 | 2 999 | 3 007 | 6 750 | 1 268 |
Tab. 1 Statistics of corpus size in experiments
集合 | NC11 | WMT18 | IWSLT14 | IWSLT15 |
---|---|---|---|---|
训练集 | 226 000 | 207 000 | 160 000 | 133 000 |
验证集 | 2 169 | 3 000 | 7 283 | 1 553 |
测试集 | 2 999 | 3 007 | 6 750 | 1 268 |
模型 | NC11 | WMT18 | 模型 | IWSLT14德英 | IWSLT15英越 | |||
---|---|---|---|---|---|---|---|---|
英德 | 德英 | 英土 | valid | test | tst2012 | tst2013 | ||
Mixed Enc. | — | — | 9.60 | ELMo | — | — | — | 29.30 |
Multi‑Task | — | — | 10.60 | CVT | — | — | — | 29.60 |
Transformer | 25.00 | 26.60 | 13.10 | SAWR | — | — | — | 29.09 |
+Multi‑Task | 24.80 | 26.70 | 14.00 | C‑MLM | 36.93 | 35.63 | 27.85 | 31.51 |
+S&H | 25.50 | 26.80 | 13.00 | Transformer | 35.27 | 34.09 | 27.03 | 30.76 |
+LISA | 25.30 | 27.10 | 13.60 | Tied‑Transform | — | 35.52 | — | — |
+PASCAL | 25.90 | 27.40 | 14.00 | Dynamic Conv | — | 35.20 | — | — |
SSED | 25.97 | 28.44 | 16.51 | Macaron | — | 35.40 | — | — |
BERT‑fused | — | 36.11 | — | — | ||||
SSED | 36.85 | 35.53 | 27.95 | 31.60 |
Tab. 2 BLEU values of different machine translation methods on various datasets
模型 | NC11 | WMT18 | 模型 | IWSLT14德英 | IWSLT15英越 | |||
---|---|---|---|---|---|---|---|---|
英德 | 德英 | 英土 | valid | test | tst2012 | tst2013 | ||
Mixed Enc. | — | — | 9.60 | ELMo | — | — | — | 29.30 |
Multi‑Task | — | — | 10.60 | CVT | — | — | — | 29.60 |
Transformer | 25.00 | 26.60 | 13.10 | SAWR | — | — | — | 29.09 |
+Multi‑Task | 24.80 | 26.70 | 14.00 | C‑MLM | 36.93 | 35.63 | 27.85 | 31.51 |
+S&H | 25.50 | 26.80 | 13.00 | Transformer | 35.27 | 34.09 | 27.03 | 30.76 |
+LISA | 25.30 | 27.10 | 13.60 | Tied‑Transform | — | 35.52 | — | — |
+PASCAL | 25.90 | 27.40 | 14.00 | Dynamic Conv | — | 35.20 | — | — |
SSED | 25.97 | 28.44 | 16.51 | Macaron | — | 35.40 | — | — |
BERT‑fused | — | 36.11 | — | — | ||||
SSED | 36.85 | 35.53 | 27.95 | 31.60 |
方法 | tst2012 | tst2013 | 方法 | tst2012 | tst2013 |
---|---|---|---|---|---|
Transformer | 27.03 | 30.76 | Syn+Enc | 26.11 | 28.73 |
Enc+Syn | 24.87 | 28.04 | Enc//Syn | 27.10 | 31.09 |
Tab. 3 BLEU values of different fusion methods on IWSLT15 English?Vietnamese tasks
方法 | tst2012 | tst2013 | 方法 | tst2012 | tst2013 |
---|---|---|---|---|---|
Transformer | 27.03 | 30.76 | Syn+Enc | 26.11 | 28.73 |
Enc+Syn | 24.87 | 28.04 | Enc//Syn | 27.10 | 31.09 |
方法 | tst2012 | tst2013 | 方法 | tst2012 | tst2013 |
---|---|---|---|---|---|
Transformer | 27.03 | 30.76 | Highway | 26.19 | 29.82 |
Average | 26.37 | 29.91 | Linear | 27.10 | 31.09 |
Gate | 26.79 | 31.09 |
Tab. 4 BLEU values of different integration methods on IWSLT15 English?Vietnamese tasks
方法 | tst2012 | tst2013 | 方法 | tst2012 | tst2013 |
---|---|---|---|---|---|
Transformer | 27.03 | 30.76 | Highway | 26.19 | 29.82 |
Average | 26.37 | 29.91 | Linear | 27.10 | 31.09 |
Gate | 26.79 | 31.09 |
层 | tst2012 | tst2013 | 层 | tst2012 | tst2013 |
---|---|---|---|---|---|
Base | 27.03 | 30.76 | Gate5 | 27.77 | 31.49 |
1 | 27.24 | 31.11 | 1-6 | 27.10 | 31.09 |
2 | 27.32 | 31.20 | 1-2 | 27.16 | 31.13 |
3 | 27.37 | 31.00 | 1-3 | 27.37 | 31.01 |
4 | 28.07 | 31.10 | 1-4 | 26.72 | 31.07 |
5 | 27.95 | 31.60 | 4-6 | 26.67 | 30.78 |
6 | 27.40 | 31.47 | 5-6 | 27.16 | 31.10 |
Enc5 | 27.86 | 31.24 |
Tab. 5 BLEU values of introducing syntactic information to different decoding layers on IWSLT15 English?Vietnamese tasks
层 | tst2012 | tst2013 | 层 | tst2012 | tst2013 |
---|---|---|---|---|---|
Base | 27.03 | 30.76 | Gate5 | 27.77 | 31.49 |
1 | 27.24 | 31.11 | 1-6 | 27.10 | 31.09 |
2 | 27.32 | 31.20 | 1-2 | 27.16 | 31.13 |
3 | 27.37 | 31.00 | 1-3 | 27.37 | 31.01 |
4 | 28.07 | 31.10 | 1-4 | 26.72 | 31.07 |
5 | 27.95 | 31.60 | 4-6 | 26.67 | 30.78 |
6 | 27.40 | 31.47 | 5-6 | 27.16 | 31.10 |
Enc5 | 27.86 | 31.24 |
示例 | 源语句 | 参考译文 | 原译文 | 句法增强 |
---|---|---|---|---|
例句1 | dies ist mein supermarkt. kein großer. | this is my supermarket. not such a big one. | this is my supermarket. this is not a big deal. | this is my supermarket. it is not a big one. |
例句2 | ich habe ihn hier auf meinem laptop. | i have got it here on my laptop. | i have it here on my laptop. | i have got it here on my laptop. |
Tab. 6 Examples of comparison between original machine translation results and syntax?enhanced machine translation results
示例 | 源语句 | 参考译文 | 原译文 | 句法增强 |
---|---|---|---|---|
例句1 | dies ist mein supermarkt. kein großer. | this is my supermarket. not such a big one. | this is my supermarket. this is not a big deal. | this is my supermarket. it is not a big one. |
例句2 | ich habe ihn hier auf meinem laptop. | i have got it here on my laptop. | i have it here on my laptop. | i have got it here on my laptop. |
1 | SUTSKEVER I, VINYALS O, LE Q V. Sequence to sequence learning with neural networks[C]// Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2014: 3104-3112 |
2 | BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[EB/OL]. (2016-05-19) [2021-08-11].. 10.1017/9781108608480.003 |
3 | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2017:6000-6010. |
4 | 李亚超,熊德意,张民. 神经机器翻译综述[J]. 计算机学报, 2018, 41(12): 2734-2755. 10.11897/SP.J.1016.2018.02734 |
LI Y C, XIONG D Y, ZHANG M. A survey of neural machine translation[J]. Chinese Journal of Computers, 2018, 41(12): 2734-2755. 10.11897/SP.J.1016.2018.02734 | |
5 | 刘洋. 神经机器翻译前沿进展[J]. 计算机研究与发展, 2017, 54(6): 1144. 10.7544/issn1000-1239.2017.20160805 |
LIU Y. Recent advances in neural machine translation[J]. Journal of Computer Research and Development, 2017, 54(6): 1144. 10.7544/issn1000-1239.2017.20160805 | |
6 | GEHRING J, AULI M, GRANGIER D, et al. Convolutional sequence to sequence learning[C]// Proceedings of the 34th International Conference on Machine Learning. New York: JMLR.org, 2017: 1243-1252. 10.18653/v1/p17-1012 |
7 | LUONG M T, PHAM H, MANNIN C D. Effective approaches to attention‑based neural machine translation[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2015:1412-1421. 10.18653/v1/d15-1166 |
8 | ERIGUCHI A, HASHIMOTO K, TSURUOKA Y. Tree‑to‑sequence attentional neural machine translation[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2016: 823-833. 10.18653/v1/p16-1078 |
9 | NGUYAN X P, JOTY S, HOI S C H, et al. Tree‑structured attention with hierarchical accumulation[EB/OL]. (2020-02-19) [2021-08-11].. |
10 | SAUNDERS D, STAHLBERG F, DE GISPERT A, et al. Multi‑representation ensembles and delayed SGD updates improve syntax based NMT[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2018: 319-325. 10.18653/v1/p18-2051 |
11 | ZHANG M S, LI Z H, FU G H, et al. Syntax‑enhanced neural machine translation with syntax‑aware word representations[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2019: 1151-1161. 10.18653/v1/n19-1118 |
12 | BUGLIARELLO E, OKAZAKI N. Enhancing machine translation with dependency‑aware self‑attention[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2020: 1618-1627. 10.18653/v1/2020.acl-main.147 |
13 | WU S Z, ZHANG D D, ZHANG Z R, et al. Dependency‑to‑ dependency neural machine translation[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2018, 26(11): 2132-2141. 10.1109/taslp.2018.2855968 |
14 | CURREY A, HEAFIELD K. Incorporating source syntax into transformer‑based neural machine translation[C]// Proceedings of the 4th Conference on Machine Translation (Volume 1: Research Papers). Stroudsburg, PA: Association for Computational Linguistics, 2019: 24-33. 10.18653/v1/w19-5203 |
15 | SENNRICH R, HADDOW B. Linguistic input features improve neural machine translation[C]// Proceedings of the 1st Conference on Machine Translation: Volume 1, Research Papers. Stroudsburg, PA: Association for Computational Linguistics, 2016: 83-91. 10.18653/v1/w16-2209 |
16 | ERIGUCHI A, TSURUOKA Y, CHO K. Learning to parse and translate improves neural machine translation[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2017: 72-78. 10.18653/v1/p17-2012 |
17 | DYER C, KUNCORO A, BALLESTEROS M, et al. Recurrent neural network grammars[C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: Association for Computational Linguistics, 2016: 199-209. 10.18653/v1/n16-1024 |
18 | CHEN H D, HUANG S J, CHIANG D, et al. Improved neural machine translation with a syntax‑aware encoder and decoder[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2017: 1936-1945. 10.18653/v1/p17-1177 |
19 | CHEN K H, WANG R, UTIYAMA M, et al. Syntax‑directed attention for neural machine translation[C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2018: 4792-4799. 10.1609/aaai.v32i1.11910 |
20 | ANASTASOPOULOS A, CHIANG D. Tied multitask learning for neural speech translation[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2018: 82-91. 10.18653/v1/n18-1008 |
21 | PETERS M E, NEUMANN M, IYYER M, et al. Deep contextualized word representations[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2018: 2227-2237. 10.18653/v1/n18-1202 |
22 | RAGANATO A, TIEDEMANN J. An analysis of encoder representations in transformer‑based machine translation[C]// Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Stroudsburg, PA: Association for Computational Linguistics, 2018: 287-297. 10.18653/v1/w18-5431 |
23 | EDUNOV S, OTT M, AULI M, et al. Classical structured prediction losses for sequence to sequence learning[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2018: 355-364. 10.18653/v1/n18-1033 |
24 | SENNRICH R, HADDOW B, BIRCH A. Neural machine translation of rare words with subword units[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2015: 1715-1725. 10.18653/v1/p16-1162 |
25 | MANNING C D, SURDEANU M, BAUER J, et al. The Stanford CoreNLP natural language processing toolkit[C]// Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Stroudsburg, PA: Association for Computational Linguistics, 2014: 55-60. 10.3115/v1/p14-5010 |
26 | OTT M, EDUNOV S, BAEVSKI A, et al. FAIRSEQ: a fast, extensible toolkit for sequence modeling[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Demonstrations). Stroudsburg, PA: Association for Computational Linguistics, 2019: 48-53. 10.18653/v1/n19-4009 |
27 | STRUBELL E, VERGA P, ANDOR D, et al. Linguistically‑ informed self‑attention for semantic role labeling[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2018: 5027-5038. 10.18653/v1/d18-1548 |
28 | CLARK K, LUONG M T, MANNING C D, et al. Semi‑supervised sequence modeling with cross‑view training[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2018: 1914-1925. 10.18653/v1/d18-1217 |
29 | WU F, FAN A, BAEVSKI A, et al. Pay less attention with lightweight and dynamic convolutions[EB/OL]. (2019-02-22) [2021-08-11].. 10.48550/arXiv.1901.10430 |
30 | XIA Y C, HE T Y, TAN X, et al. Tied Transformers: neural machine translation with shared encoder and decoder[C]// Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2019: 5466-5473. 10.1609/aaai.v33i01.33015466 |
31 | LU Y P, LI Z H, HE D, et al. Understanding and improving Transformer from a multi‑particle dynamic system point of view[EB/OL]. (2019-06-06) [2021-08-11].. |
32 | CHEN Y C, GAN Z, CHENG Y, et al. Distilling knowledge learned in BERT for text generation[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2019: 7893-7905. 10.18653/v1/2020.acl-main.705 |
33 | ZHU J H, XIA Y C, WU L J, et al. Incorporating BERT into neural machine translation[EB/OL]. (2020-02-17) [2021-08-18].. |
34 | TU Z P, LIU Y, SHI S M, et al. Learning to remember translation history with a continuous cache[J]. Transactions of the Association for Computational Linguistics, 2018, 6: 407-420. 10.1162/tacl_a_00029 |
35 | ZHANG J C, LUAN H B, SUN M S, et al. Improving the Transformer translation model with document‑level context[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2018: 533-542. 10.18653/v1/d18-1049 |
36 | MSRUF S, MARTINS A F T, HAFFARI G. Selective attention for context‑aware neural machine translation[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2019: 3092-3102. |
37 | CAO Q, XIONG D Y. Encoding gated translation memory into neural machine translation[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2018: 3042-3047. 10.18653/v1/d18-1340 |
38 | KUANG S H, XIONG D Y. Fusing recency into neural machine translation with an inter‑sentence gate model[C]// Proceedings of the 27th International Conference on Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2018: 607-617. 10.18653/v1/p18-1164 |
39 | STOJANOVSKI D, FRASER A. Coreference and coherence in neural machine translation: a study using oracle experiments[C]// Proceedings of the 3rd Conference on Machine Translation: Research Papers. Stroudsburg, PA: Association for Computational Linguistics, 2018: 49-60. 10.18653/v1/w18-6306 |
40 | VOITA E, SERDYUKOV P, SENNRICH R, et al. Context‑aware neural machine translation learns anaphora resolution[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2018: 1264-1274. 10.18653/v1/p18-1117 |
[1] | Guanyou XU, Weisen FENG. Python named entity recognition model based on transformer [J]. Journal of Computer Applications, 2022, 42(9): 2693-2700. |
[2] | Jiehang DENG, Wenquan GUO, Hanjie CHEN, Guosheng GU, Jingjian LIU, Yukun DU, Chao LIU, Xiaodong KANG, Jian ZHAO. Few-shot diatom detection combining multi-scale multi-head self-attention and online hard example mining [J]. Journal of Computer Applications, 2022, 42(8): 2593-2600. |
[3] | Xianjie ZHANG, Zhiming ZHANG. Handwritten English text recognition based on convolutional neural network and Transformer [J]. Journal of Computer Applications, 2022, 42(8): 2394-2400. |
[4] | Xianfeng YANG, Jiahe ZHAO, Ziqiang LI. Text classification model combining word annotations [J]. Journal of Computer Applications, 2022, 42(5): 1317-1323. |
[5] | Haifeng ZHANG, Cheng ZENG, Lie PAN, Rusong HAO, Chaodong WEN, Peng HE. News topic text classification method based on BERT and feature projection network [J]. Journal of Computer Applications, 2022, 42(4): 1116-1124. |
[6] | Yingjie WANG, Jiuqi ZHU, Zumin WANG, Fengbo BAI, Jian GONG. Review of applications of natural language processing in text sentiment analysis [J]. Journal of Computer Applications, 2022, 42(4): 1011-1020. |
[7] | Shoulong JIAO, Youxiang DUAN, Qifeng SUN, Zihao ZHUANG, Chenhao SUN. Knowledge representation learning method incorporating entity description information and neighbor node features [J]. Journal of Computer Applications, 2022, 42(4): 1050-1056. |
[8] | Qiujie SUN, Jinggui LIANG, Si LI. Chinese grammatical error correction model based on bidirectional and auto-regressive transformers noiser [J]. Journal of Computer Applications, 2022, 42(3): 860-866. |
[9] | Zengzhen DU, Dongxin TANG, Dan XIE. Method of generating rhetorical questions based on deep neural network in intelligent consultation [J]. Journal of Computer Applications, 2022, 42(3): 867-873. |
[10] | Yi ZHANG, Shuangsheng WANG, Bin HE, Peiming YE, Keqiang LI. Named entity recognition method of elementary mathematical text based on BERT [J]. Journal of Computer Applications, 2022, 42(2): 433-439. |
[11] | Xiayang SHI, Fengyuan ZHANG, Jiaqi YUAN, Min HUANG. Detection of unsupervised offensive speech based on multilingual BERT [J]. Journal of Computer Applications, 2022, 42(11): 3379-3385. |
[12] | Lanlan ZENG, Yisong WANG, Panfeng CHEN. Named entity recognition based on BERT and joint learning for judgment documents [J]. Journal of Computer Applications, 2022, 42(10): 3011-3017. |
[13] | Jianqing LYU, Xianbing WANG, Gang CHEN, Hua ZHANG, Minggang WANG. Chinese Text-to-SQL model for industrial production [J]. Journal of Computer Applications, 2022, 42(10): 2996-3002. |
[14] | Xueqiang LYU, Chen PENG, Le ZHANG, Zhi’an DONG, Xindong YOU. Text multi-label classification method incorporating BERT and label semantic attention [J]. Journal of Computer Applications, 2022, 42(1): 57-63. |
[15] | Yu PENG, Xiaoyu LI, Shijie HU, Xiaolei LIU, Weizhong QIAN. Three-stage question answering model based on BERT [J]. Journal of Computer Applications, 2022, 42(1): 64-70. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||