[1] 李亚超, 熊德意, 张民. 神经机器翻译综述[J]. 计算机学报, 2018,41(12):2734-2755. (LI Y C,XIONG D Y,ZHANG M. A survey of neural machine translation[J]. Chinese Journal of Computers,2018,41(12):2734-2755.) [2] SUTSKEVER I,VINYALS O,LE Q V. Sequence to sequence learning with neural networks[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge:MIT Press,2014:3104-3112. [3] GEHRING J,AULI M,GRANGIER D,et al. Convolutional sequence to sequence learning[C]//Proceedings of the 34th International Conference on Machine Learning. Sydney:PMLR,2017:1243-1252. [4] VASWANI A,SHAZEER N,PARMAR N,et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook,NY:Curran Associates Inc.,2017:6000-6010. [5] SCHWENK H. Continuous space translation models for phrasebased statistical machine translation[C]//Proceedings of the 2012 International Conference on Computational Linguistics. Stroudsburg,PA:Association for Computational Linguistics,2012:1071-1080. [6] KAISER Ł,BENGIO S. Can active memory replace attention?[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook,NY:Curran Associates Inc., 2016:3781-3789. [7] GU J,BRADBURY J,XIONG C,et al. Non-autoregressive neural machine translation[EB/OL].[2019-11-20]. https://arxiv.org/pdf/1711.02281.pdf. [8] WANG Y,TIAN F,HE D,et al. Non-autoregressive machine translation with auxiliary regularization[EB/OL].[2019-11-20]. https://arxiv.org/pdf/1902.10245.pdf. [9] GUO J,TAN X,HE D,et al. Non-autoregressive neural machine translation with enhanced decoder input[C]//Proceedings of the 33rd Conference on the Association for the Advance of Artificial Intelligence. Palo Alto,CA:AAAI,2019:3723-3730. [10] LEE J,MANSIMOV E,CHO K. Deterministic non-autoregressive neural sequence modeling by iterative refinement[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg,PA:Association for Computational Linguistics,2018:1173-1182. [11] GHAZVININEJAD M,LEVY O,LIU Y,et al. Mask-predict:parallel decoding of conditional masked language models[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg,PA:Association for Computational Linguistics,2019:6112-6121. [12] STERN M,CHAN W,KIROS J,et al. Insertion transformer:flexible sequence generation via insertion operations[C]//Proceedings of the 36th International Conference on Machine Learning. Long Beach,CA,USA:ICML,2019:5976-5985. [13] WANG C,ZHANG J,CHEN H. Semi-autoregressive neural machine translation[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA:Association for Computational Linguistics,2018:479-488. [14] HILL F,CHO K,KORHONEN A. Learning distributed representations of sentences from unlabelled data[EB/OL].[2019-11-20]. https://arxiv.org/pdf/1602.03483.pdf. [15] ARTETXE M,LABAKA G,AGIRRE E,et al. Unsupervised neural machine translation[EB/OL].[2019-11-20]. https://arxiv.org/pdf/1710.11041.pdf. [16] LAMPLE G,CONNEAU A,DENOYER L,et al. Unsupervised machine translation using monolingual corpora only[EB/OL].[2019-11-20]. https://arxiv.org/pdf/1711.00043.pdf. [17] SENNRICH R,HADDOW B,BIRCH A. Neural machine translation of rare words with subword units[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA:Association for Computational Linguistics, 2016:1715-1725. [18] 张新路, 李晓, 杨雅婷, 等. 面向维汉神经机器翻译的双向重排序模型分析[J]. 北京大学学报自然科学版,2020,56(1):31-38.(ZHANG X L,LI X,YANG Y T,et al. Analysis of bi-directional reranking model for Uyghur-Chinese neural machine translation[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2020,56(1):31-38.) [19] PAPINENI K,ROUKOS S,WARD T,et al. BLEU:a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Stroudsburg,PA:Association for Computational Linguistics, 2002:311-318. [20] 高钦泉, 赵岩, 李根, 等. 基于知识蒸馏的超分辨率卷积神经网络压缩方法[J]. 计算机应用,2019,39(10):2802-2808.(GAO Q Q,ZHAO Y,LI G,et al. Compression method of super-resolution convolution neural network based on knowledge distillation[J]. Journal of Computer Applications,2019,39(10):2802-2808.) [21] HINTON G,VINYALS O,DEAN J. Distilling the knowledge in a neural network[EB/OL].[2019-11-20]. https://arxiv.org/pdf/1503.02531.pdf. [22] KIM Y,RUSH A M. Sequence-level knowledge distillation[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Stroudsburg,PA:Association for Computational Linguistics,2016:1317-1327. [23] KINGMA D P,BA J L. Adam:a method for stochastic optimization[EB/OL].[2019-11-20]. https://arxiv.org/pdf/1412.6980.pdf. |