《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (1): 69-74.DOI: 10.11772/j.issn.1001-9081.2023121880

• 人工智能 • 上一篇    下一篇

融入实体翻译的汉越神经机器翻译模型

高盛祥1,2(), 侯哲1,2, 余正涛1,2, 赖华1,2   

  1. 1.昆明理工大学 信息工程与自动化学院,昆明 650504
    2.云南省人工智能重点实验室(昆明理工大学),昆明 650504
  • 收稿日期:2024-01-11 修回日期:2024-04-07 接受日期:2024-04-10 发布日期:2024-05-07 出版日期:2025-01-10
  • 通讯作者: 高盛祥
  • 作者简介:侯哲(1997—),男,山西大同人,硕士研究生,主要研究方向:自然语言处理、机器翻译;
    余正涛(1970—),男,云南曲靖人,教授,博士,CCF会员,主要研究方向:自然语言处理、机器翻译、信息检索、网络安全;
    赖华(1966—),男,云南昆明人,副教授,硕士,主要研究方向:智能信息处理、复杂过程控制。
  • 基金资助:
    国家自然科学基金资助项目(62376111);云南高新技术产业发展项目(201606);云南省重大科技专项计划项目(202303AP140008);云南省基础研究计划项目(202001AS070014);云南省科技人才与平台计划项目(202105AC160018)

Chinese-Vietnamese neural machine translation model incorporating entity translation

Shengxiang GAO1,2(), Zhe HOU1,2, Zhengtao YU1,2, Hua LAI1,2   

  1. 1.Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming Yunnan 650504,China
    2.Key Laboratory of Artificial Intelligence in Yunnan Province (Kunming University of Science and Technology),Kunming Yunnan 650504,China
  • Received:2024-01-11 Revised:2024-04-07 Accepted:2024-04-10 Online:2024-05-07 Published:2025-01-10
  • Contact: Shengxiang GAO
  • About author:HOU Zhe, born in 1997, M. S. candidate. His research interests include natural language processing, machine translation.
    YU Zhengtao, born in 1970, Ph. D., professor. His research interests include natural language processing, machine translation, information retrieval, network security.
    LAI Hua, born in 1966, M. S., associate professor. His research interests include intelligent information processing, complex process control.
  • Supported by:
    National Natural Science Foundation of China(62376111);Yunnan High-Tech Industry Development Project(201606);Yunnan Provincial Key Research and Development Program(202303AP140008);Yunnan Provincial Basic Research Program(202001AS070014);Yunnan Provincial Science and Technology Talent and Platform Program(202105AC160018)

摘要:

在汉越低资源翻译任务中,句子中的实体词准确翻译是一大难点。针对实体词在训练语料中出现的频率较低,模型无法构建双语实体词之间的映射关系等问题,构建一种融入实体翻译的汉越神经机器翻译模型。首先,通过汉越实体双语词典预先获取源句中实体词的翻译结果;其次,将结果拼接在源句末端作为模型的输入,同时在编码端引入“约束提示信息”增强表征;最后,在解码端融入指针网络机制,以确保模型能复制输出源端句的词汇。实验结果表明,该模型相较于跨语言模型XLM-R (Cross-lingual Language Model-RoBERTa)的双语评估替补(BLEU)值在汉越方向提升了1.37,越汉方向提升了0.21,时间性能上相较于Transformer该模型在汉越方向和越汉方向分别缩短3.19%和3.50%,可有效地提升句子中实体词翻译的综合性能。

关键词: 汉越神经机器翻译, 实体翻译, 双语词典, 指针网络, 低资源

Abstract:

In low-resource Chinese-Vietnamese translation tasks, translating entity words in sentences accurately is a significant challenge. In order to solve the problems such as the low frequency of entity words in training corpus and the inability of the model to construct the mapping relationship between bilingual entity words, a Chinese-Vietnamese neural machine translation model that incorporates entity translation was constructed. Firstly, the translation results of entity words in the source sentence were obtained through a Chinese-Vietnamese bilingual entity dictionary. Then, these results were concatenated at the end of the source sentence as input to the model, and the “constraint prompt information” was introduced at the encoding end to enhance representation. Finally, a pointer network mechanism was integrated at the decoding end to ensure that the model was able to replicate the vocabulary of the source sentence. Experimental results show that this model achieves increases of 1.37 and 0.21 points in BiLingual Evaluation Understudy (BLEU) for Chinese-Vietnamese translation and Vietnamese-Chinese translation compared to the cross-lingual language model — XLM-R (Cross-lingual Language Model-RoBERTa) and shortens training time by 3.19% and 3.50% compared to Transformer for Chinese-Vietnamese translation and Vietnamese-Chinese translation, enhancing the comprehensive performance of entity word translation in sentences effectively.

Key words: Chinese-Vietnamese neural machine translation, entity translation, bilingual dictionary, pointer network, low-resource

中图分类号: