Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (12): 3679-3685.DOI: 10.11772/j.issn.1001-9081.2021101805
Special Issue: 人工智能
• Artificial intelligence • Previous Articles Next Articles
Zhijin LI1,2, Hua LAI1,2(), Yonghua WEN1,2, Shengxiang GAO1,2
Received:
2021-10-22
Revised:
2022-01-06
Accepted:
2022-01-24
Online:
2022-04-26
Published:
2022-12-10
Contact:
Hua LAI
About author:
LI Zhijin, born in 1997, M. S. candidate. His research interests include machine translation, natural language processing.Supported by:
李治瑾1,2, 赖华1,2(), 文永华1,2, 高盛祥1,2
通讯作者:
赖华
作者简介:
李治瑾(1997—),男,辽宁大连人,硕士研究生,主要研究方向:机器翻译、自然语言处理基金资助:
CLC Number:
Zhijin LI, Hua LAI, Yonghua WEN, Shengxiang GAO. Neural machine translation integrating bidirectional-dependency self-attention mechanism[J]. Journal of Computer Applications, 2022, 42(12): 3679-3685.
李治瑾, 赖华, 文永华, 高盛祥. 融合双向依存自注意力机制的神经机器翻译[J]. 《计算机应用》唯一官方网站, 2022, 42(12): 3679-3685.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2021101805
语料 | 语料总数 | 训练集 | 验证集 | 测试集 |
---|---|---|---|---|
汉-泰 | 1 066 004 | 1 055 002 | 5 001 | 6 001 |
汉-英 | 8 021 474 | 8 018 471 | 2 002 | 1 001 |
英-德 | 174 272 | 160 239 | 7 283 | 6 750 |
汉-泰小 | 211 002 | 200 000 | 5 001 | 6 001 |
汉-英小 | 203 003 | 200 000 | 2 002 | 1 001 |
Tab. 1 Details of datasets
语料 | 语料总数 | 训练集 | 验证集 | 测试集 |
---|---|---|---|---|
汉-泰 | 1 066 004 | 1 055 002 | 5 001 | 6 001 |
汉-英 | 8 021 474 | 8 018 471 | 2 002 | 1 001 |
英-德 | 174 272 | 160 239 | 7 283 | 6 750 |
汉-泰小 | 211 002 | 200 000 | 5 001 | 6 001 |
汉-英小 | 203 003 | 200 000 | 2 002 | 1 001 |
模型 | 汉-泰 | 泰-汉 | 汉-英 | 英-汉 | 英-德 | 德-英 | 汉-泰小 | 泰-汉小 | 汉-英小 | 英-汉小 |
---|---|---|---|---|---|---|---|---|---|---|
Transformer | 9.16 | 7.37 | 21.29 | 19.14 | 28.30 | 34.30 | 3.15 | 2.61 | 10.92 | 9.37 |
Pascal | 9.49 | 7.98 | 21.67 | 19.53 | 28.64 | 34.60 | 3.59 | 3.14 | 11.31 | 9.65 |
Bi-Dependency | 10.23 | 8.23 | 22.08 | 19.82 | 28.76 | 34.73 | 3.66 | 3.67 | 11.96 | 9.77 |
Tab. 2 BLEU results of bidirectional translation among different models
模型 | 汉-泰 | 泰-汉 | 汉-英 | 英-汉 | 英-德 | 德-英 | 汉-泰小 | 泰-汉小 | 汉-英小 | 英-汉小 |
---|---|---|---|---|---|---|---|---|---|---|
Transformer | 9.16 | 7.37 | 21.29 | 19.14 | 28.30 | 34.30 | 3.15 | 2.61 | 10.92 | 9.37 |
Pascal | 9.49 | 7.98 | 21.67 | 19.53 | 28.64 | 34.60 | 3.59 | 3.14 | 11.31 | 9.65 |
Bi-Dependency | 10.23 | 8.23 | 22.08 | 19.82 | 28.76 | 34.73 | 3.66 | 3.67 | 11.96 | 9.77 |
模型 | 汉-英 | 英-汉 | 汉-英小 | 英-汉小 |
---|---|---|---|---|
Transformer | 21.29 | 19.14 | 10.92 | 9.37 |
Transformer+CWord | 21.64 | 19.01 | 10.99 | 9.67 |
Pascal | 21.67 | 19.53 | 11.31 | 9.65 |
Bi-Dependency | 22.08 | 19.82 | 11.96 | 9.77 |
Tab. 3 BLEU values comparison of fusing with unidirectional-/bidirectional-dependency information
模型 | 汉-英 | 英-汉 | 汉-英小 | 英-汉小 |
---|---|---|---|---|
Transformer | 21.29 | 19.14 | 10.92 | 9.37 |
Transformer+CWord | 21.64 | 19.01 | 10.99 | 9.67 |
Pascal | 21.67 | 19.53 | 11.31 | 9.65 |
Bi-Dependency | 22.08 | 19.82 | 11.96 | 9.77 |
注意力层 | 汉-英 | 英-汉 | 汉-英小 | 英-汉小 |
---|---|---|---|---|
1 | 22.08 | 19.82 | 11.96 | 9.77 |
2 | 21.43 | 18.94 | 11.12 | 9.67 |
3 | 21.74 | 19.40 | 10.48 | 9.67 |
4 | 21.49 | 18.81 | 10.97 | 9.37 |
5 | 21.69 | 18.94 | 11.18 | 9.72 |
6 | 21.45 | 18.99 | 10.99 | 9.72 |
Tab.4 BLEU values comparison of fusing bidirectional-dependency information in different attention layers
注意力层 | 汉-英 | 英-汉 | 汉-英小 | 英-汉小 |
---|---|---|---|---|
1 | 22.08 | 19.82 | 11.96 | 9.77 |
2 | 21.43 | 18.94 | 11.12 | 9.67 |
3 | 21.74 | 19.40 | 10.48 | 9.67 |
4 | 21.49 | 18.81 | 10.97 | 9.37 |
5 | 21.69 | 18.94 | 11.18 | 9.72 |
6 | 21.45 | 18.99 | 10.99 | 9.72 |
模型 | 汉-英 | 英-汉 | 汉-英小 | 英-汉小 |
---|---|---|---|---|
Transformer | 21.29 | 19.14 | 10.92 | 9.37 |
Bi-Dependency-GWF | 20.82 | 18.95 | 9.95 | 8.40 |
Bi-Dependency | 22.08 | 19.82 | 11.96 | 9.77 |
Tab. 5 Comparison of BLEU values before and after adding Gaussian noise
模型 | 汉-英 | 英-汉 | 汉-英小 | 英-汉小 |
---|---|---|---|---|
Transformer | 21.29 | 19.14 | 10.92 | 9.37 |
Bi-Dependency-GWF | 20.82 | 18.95 | 9.95 | 8.40 |
Bi-Dependency | 22.08 | 19.82 | 11.96 | 9.77 |
1 | ERIGUCHI A, HASHIMOTO K, TSURUOKA Y. Tree-to-sequence attentional neural machine translation[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2016: 823-833. 10.18653/v1/p16-1078 |
2 | AHARONI R, GOLDBERG Y. Towards string-to-tree neural machine translation[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2017: 132-140. 10.18653/v1/p17-2021 |
3 | GŪ J, SHAVARANI H S, SARKAR A. Top-down tree structured decoding with syntactic connections for neural machine translation and parsing[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2018: 401-413. 10.18653/v1/d18-1037 |
4 | ELMAN J L. Finding structure in time[J]. Cognitive Science, 1990, 14(2):179-211. 10.1207/s15516709cog1402_1 |
5 | HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8):1735-1780. 10.1162/neco.1997.9.8.1735 |
6 | BUGLIARELLO E, OKAZAKI N. Enhancing machine translation with dependency-aware self-attention[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2020: 1618-1627. 10.18653/v1/2020.acl-main.147 |
7 | WU S Z, ZHANG D D, ZHANG Z R, et al. Dependency-to-dependency neural machine translation[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2018, 26(11):2132-2141. 10.1109/taslp.2018.2855968 |
8 | ZHANG M S, LI Z H, FU G H, et al. Syntax-enhanced neural machine translation with syntax-aware word representations[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2019: 1151-1161. 10.18653/v1/n19-1118 |
9 | SAUNDERS D, STAHLBERG F, DE GISPERT A, et al. Multi-representation ensembles and delayed SGD updates improve syntax-based NMT[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2018: 319-325. 10.18653/v1/p18-2051 |
10 | CHOSHEN L, ABEND O. Transition based graph decoder for neural machine translation[EB/OL]. (2021-01-29) [2021-05-10].. 10.18653/v1/k19-1028 |
11 | 安静. 基于依存句法分析与序列标注的英文长句机器翻译[J]. 兰州理工大学学报, 2018, 44(1):100-103. 10.3969/j.issn.1673-5196.2018.01.020 |
AN J. Machine translation of long English sentence based on dependency parsing and sequence labeling[J]. Journal of Lanzhou University of Technology, 2018, 44(1):100-103. 10.3969/j.issn.1673-5196.2018.01.020 | |
12 | 王振晗,何建雅琳,余正涛,等. 融合句法解析树的汉-越卷积神经机器翻译[J]. 软件学报, 2020, 31(12):3797-3807. |
WANG Z H, HE J Y L, YU Z T, et al. Chinese-Vietnamese convolutional neural machine translation with incorporating syntactic parsing tree[J]. Journal of Software, 2020, 31(12):3797-3807. | |
13 | WANG C Y, WU S Z, LIU S J. Source dependency-aware transformer with supervised self-attention[EB/OL]. (2019-09-05) [2021-05-10].. |
14 | NGUYEN X P, JOTY S, HOI S, et al. Tree-structured attention with hierarchical accumulation[EB/OL]. (2021-02-19) [2021-05-10].. |
15 | ZHANG T F, HUANG H Y, FENG C, et al. Self-supervised bilingual syntactic alignment for neural machine translation[C]// Proceedings of the 35th AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2021: 14454-14462. 10.1609/aaai.v35i16.17699 |
16 | SLOBODKIN A, CHOSHEN L, ABEND O. Semantics-aware attention improves neural machine translation[C]// Proceedings of the 11th Joint Conference on Lexical and Computational Semantics. Stroudsburg, PA: Association for Computational Linguistics, 2022: 28-43. 10.18653/v1/2022.starsem-1.3 |
17 | 张海玲,邵玉斌,杨丹,等. 基于句法规则层次化分析的神经机器翻译[J]. 小型微型计算机系统, 2021, 42(11):2300-2306. 10.3969/j.issn.1000-1220.2021.11.010 |
ZHANG H L, SHAO Y B, YANG D, et al. Neural machine translation based on hierarchical analysis of syntactic rules[J]. Journal of Chinese Computer Systems, 2021, 42(11):2300-2306. 10.3969/j.issn.1000-1220.2021.11.010 | |
18 | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2017: 6000-6010. |
19 | DREDZE M, BLITZER J, TALUKDAR P P, et al. Frustratingly hard domain adaptation for dependency parsing [C]// Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Stroudsburg, PA: Association for Computational Linguistics, 2007: 1051-1055. |
20 | SRIVASTAVA N, HINTON G, KRIZHEVSKY A, et al. Dropout: a simple way to prevent neural networks from overfitting[J]. Journal of Machine Learning Research, 2014, 15:1929-1958. |
21 | KOEHN P, HOANG H, BIRCH A, et al. Moses: open source toolkit for statistical machine translation[C]// Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions. Stroudsburg, PA: Association for Computational Linguistics, 2007: 177-180. 10.3115/1557769.1557821 |
22 | PAPINENI K, ROUKOS S, WARD T, et al. BLEU: a method for automatic evaluation of machine translation[C]// Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2002: 311-318. 10.3115/1073083.1073135 |
23 | SZEGEDY C, VANHOUCKE V, IOFFE S, et al. Rethinking the inception architecture for computer vision[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 2818-2826. 10.1109/cvpr.2016.308 |
24 | WU Y H, SCHUSTER M, CHEN Z F, et al. Google's neural machine translation system: bridging the gap between human and machine translation[EB/OL]. (2016-10-08) [2021-05-10].. |
25 | RAGANATO A, TIEDEMANN J. An analysis of encoder representations in transformer-based machine translation[C]// Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Stroudsburg, PA: Association for Computational Linguistics, 2018: 287-297. 10.18653/v1/w18-5431 |
[1] | Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406. |
[2] | Caiqin WANG, Yuhao ZHOU, Shunxiang ZHANG, Yanhui WANG, Xiaolong WANG. Aspect-opinion pair extraction of new energy vehicle complaint text based on context enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2430-2436. |
[3] | Hao CHAO, Shuqi FENG, Yongli LIU. Convolutional recurrent neural network optimized by multiple context vectors in EEG-based emotion recognition [J]. Journal of Computer Applications, 2024, 44(7): 2041-2046. |
[4] | Yushan JIANG, Yangsen ZHANG. Large language model-driven stance-aware fact-checking [J]. Journal of Computer Applications, 2024, 44(10): 3067-3073. |
[5] | Kai ZHANG, Zhengchu QIN, Yue LIU, Xinyi QIN. Multi-learning behavior collaborated knowledge tracing model [J]. Journal of Computer Applications, 2023, 43(5): 1422-1429. |
[6] | Xingbin LIAO, Xiaolin QIN, Siqi ZHANG, Yangge QIAN. Review of interactive machine translation [J]. Journal of Computer Applications, 2023, 43(2): 329-334. |
[7] | Anqin ZHANG, Xiaohui WANG. Power battery safety warning based on time series anomaly detection [J]. Journal of Computer Applications, 2023, 43(12): 3799-3805. |
[8] | Hong YANG, He ZHANG, Shaoning JIN. Human pose transfer model combining convolution and multi-head attention [J]. Journal of Computer Applications, 2023, 43(11): 3403-3410. |
[9] | Dan XU, Hongfang GONG, Rongrong LUO. Aspect sentiment analysis with aspect item and context representation [J]. Journal of Computer Applications, 2023, 43(10): 3086-3092. |
[10] | Lei YANG, Hongdong ZHAO, Kuaikuai YU. End-to-end speech emotion recognition based on multi-head attention [J]. Journal of Computer Applications, 2022, 42(6): 1869-1875. |
[11] | Zengzhen DU, Dongxin TANG, Dan XIE. Method of generating rhetorical questions based on deep neural network in intelligent consultation [J]. Journal of Computer Applications, 2022, 42(3): 867-873. |
[12] | Longchao GONG, Junjun GUO, Zhengtao YU. Neural machine translation method based on source language syntax enhanced decoding [J]. Journal of Computer Applications, 2022, 42(11): 3386-3394. |
[13] | JIA Chengxun, LAI Hua, YU Zhengtao, WEN Yonghua, YU Zhiqiang. Chinese-Vietnamese pseudo-parallel corpus generation based on monolingual language model [J]. Journal of Computer Applications, 2021, 41(6): 1652-1658. |
[14] | Yu DENG, Xiaoyu LI, Jian CUI, Qi LIU. Multi-head attention memory network for short text sentiment classification [J]. Journal of Computer Applications, 2021, 41(11): 3132-3138. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||