计算机应用 ›› 2017, Vol. 37 ›› Issue (10): 2813-2818.DOI: 10.11772/j.issn.1001-9081.2017.10.2813

• 人工智能 • 上一篇    下一篇

基于分层编码的深度增强学习对话生成

赵宇晴, 向阳   

  1. 同济大学 计算机科学与技术系, 上海 201800
  • 收稿日期:2017-04-28 修回日期:2017-06-14 出版日期:2017-10-10 发布日期:2017-10-16
  • 通讯作者: 向阳(1962-),男,重庆人,教授,博士,主要研究方向:管理信息系统、云计算、语义计算、大数据挖掘,E-mail:shxiangyang@tongji.edu.cn
  • 作者简介:赵宇晴(1995-),女,辽宁盘锦人,硕士研究生,主要研究方向:自然语言处理、深度学习;向阳(1962-),男,重庆人,教授,博士,主要研究方向:管理信息系统、云计算、语义计算、大数据挖掘.
  • 基金资助:
    国家自然科学基金资助项目(71571136);国家973计划项目(2014CB340404);上海市科委基础研究项目(16JC1403000)。

Dialog generation based on hierarchical encoding and deep reinforcement learning

ZHAO Yuqing, XIANG Yang   

  1. Department of Computer Science and Technology, Tongji University, Shanghai 201800, China
  • Received:2017-04-28 Revised:2017-06-14 Online:2017-10-10 Published:2017-10-16
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (71571136), the National Basic Research Program (973 Program) of China (2014CB340404), the Shanghai Municipal Science and Technology Research Project (16JC1403000).Z

摘要: 面向对话生成问题,提出一种构建对话生成模型的方法--基于分层编码的深度增强学习对话模型(EHRED),用以解决当前标准序列到序列(seq2seq)结构采用最大似然函数作为目标函数所带来的易生成通用回答的问题。该方法结合了分层编码和增强学习技术,利用分层编码来对多轮对话进行建模,在标准seq2seq的基础上新增了中间层来加强对历史对话语句的记忆,而后采用了语言模型来构建奖励函数,进而用增强学习中的策略梯度方法代替原有的最大似然损失函数进行训练。实验结果表明EHRED能生成语义信息更丰富的回答,在标准的人工测评中,其效果优于当前广泛采用的标准seq2seq循环神经网络(RNN)模型5.7~11.1个百分点。

关键词: 对话生成, 深度增强学习, 分层编码, 循环神经网络, 序列到序列

Abstract: Aiming at dialog generation problem, a dialog generation model based on hierarchical encoding and deep reinforcement learning, namely Enhanced Hierarchical Recurrent Encoder-Decoder (EHRED) was proposed to solve the problem that standard sequence to sequence (seq2seq) architectures are more likely to raise highly generic responses due to the Maximum Likelihood Estimate (MLE) loss function. A multi-round dialog model was built by hierarchical structure, and a hierarchical layer was added to enhance the memory of history dialog based on the standard seq2seq architecture, and then a language model was used to build reward function, replacing traditional MLE loss function with policy gradient method in deep reinforcement learning for training. Experimental results show that EHRED can generate responses with richer semantic information and improve by 5.7-11.1 percentage points in standard manual evaluation compared with the widely used traditional standard seq2seq Recurrent Neural Network (RNN) dialog generation model.

Key words: dialog generation, deep reinforcement learning, hierarchical encoding, recurrent neural network, sequence to sequence (seq2seq)

中图分类号: