计算机应用 ›› 2018, Vol. 38 ›› Issue (5): 1283-1288.DOI: 10.11772/j.issn.1001-9081.2017102455

• 人工智能 • 上一篇    下一篇

面向汉维机器翻译的调序表重构模型

潘一荣1,2,3, 李晓1,3, 杨雅婷1,3, 米成刚1,3, 董瑞1,3   

  1. 1. 中国科学院 新疆理化技术研究所, 乌鲁木齐 830011;
    2. 中国科学院大学, 北京 100049;
    3. 新疆民族语音语言信息处理实验室, 乌鲁木齐 830011
  • 收稿日期:2017-10-16 修回日期:2017-11-24 出版日期:2018-05-10 发布日期:2018-05-24
  • 通讯作者: 李晓
  • 作者简介:潘一荣(1992-),女,天津人,博士研究生,CCF会员,主要研究方向:自然语言处理、机器翻译;李晓(1957-),男,新疆乌鲁木齐人,研究员,博士生导师,硕士,主要研究方向:多语种信息处理、信息系统;杨雅婷(1985-),女,新疆奇台人,副研究员,博士,主要研究方向:多语种信息处理;米成刚(1986-),男,陕西渭南人,助理研究员,博士,主要研究方向:多语种信息处理;董瑞(1985-),男,新疆塔城人,助理研究员,博士,主要研究方向:多语种信息处理。
  • 基金资助:
    中国科学院西部之光项目(2015-XBQN-B-10);新疆自治区重大科技专项课题(2016A03007-3);新疆自治区重点实验室开放课题(2015KL031);新疆维吾尔族自治区自然科学基金资助项目(2015211B034)。

Reordering table reconstruction model for Chinese-Uyghur machine translation

PAN Yirong1,2,3, LI Xiao1,3, YANG Yating1,3, MI Chenggang1,3, DONG Rui1,3   

  1. 1. Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi Xinjiang 830011, China;
    2. University of Chinese Academy of Sciences, Beijing 100049, China;
    3. Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi Xinjiang 830011, China
  • Received:2017-10-16 Revised:2017-11-24 Online:2018-05-10 Published:2018-05-24
  • Contact: 李晓
  • Supported by:
    This work is partially supported by the Chinese Academy of Sciences "Light of West China" Program (YBXM-2014-04), the Major Subject of Science and Technology of Xinjiang Uygur Autonomous Region (2016A03007-3), the Key Laboratory Open Project of Xinjiang Uygur Autonomous Region (2015KL031), the Natural Science Foundation of Xinjiang Uygur Autonomous Region (2015211B034).

摘要: 针对词汇化调序模型在机器翻译中存在的上下文无关性及稀疏性问题,提出了基于语义内容进行调序方向及概率预测的调序表重构模型。首先,使用连续分布式表示方法获取调序规则的特征向量;然后,通过循环神经网络(RNN)对于向量化表示的调序规则进行调序方向及概率预测;最后,过滤并重构调序表,赋予原始调序规则更加合理的调序概率分布值,提高调序模型中调序信息的准确度,同时降低调序表规模,提高后续解码速率。实验结果表明,将调序表重构模型应用至汉维机器翻译任务中,BLEU值可以获得0.39的提升。

关键词: 汉维机器翻译, 调序表重构模型, 词汇化调序, 语义内容, 连续分布式表示, 循环神经网络

Abstract: Focused on the issue that lexicalized reordering models are faced with context independence and sparsity problems in machine translation, a reordering table reconstruction model based on semantic content for reordering orientation and probability prediction was proposed. Firstly, continuous distributed representation approach was employed to acquire the feature vectors of reordering rules. Secondly, Recurrent Neural Networks (RNN) were utilized to predict the reordering orientation and probability of each reordering rule that represented with dense vectors. Finally, the original reordering table was filtered and reconstructed with more reasonable reordering probability distribution for the purpose of improving the reordering information accuracy in reordering model as well as reducing the size of the reordering table to speed up subsequent decoding process. The experimental results show that the reordering table reconstruction model can provide BLEU point gains (+0.39) for Chinese to Uyghur machine translation task.

Key words: Chinese-Uyghur machine translation, reordering table reconstruction model, lexicalized reordering, semantic content, continuous distributed representation, Recurrent Neural Network (RNN)

中图分类号: