《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (1): 15-21.DOI: 10.11772/j.issn.1001-9081.2021111875

所属专题: 人工智能

• 人工智能 • 上一篇    下一篇

考虑多粒度反馈的多轮对话强化学习推荐算法

姚华勇, 叶东毅, 陈昭炯   

  1. 福州大学 计算机与大数据学院,福州 350108
  • 收稿日期:2021-11-09 修回日期:2022-05-05 发布日期:2023-01-12
  • 通讯作者: 叶东毅(1964—),男,福建泉州人,教授,博士,主要研究方向:机器学习yiedy@fzu.edu.cn;
  • 作者简介:姚华勇(1998—),男,福建南平人,硕士研究生,主要研究方向:推荐算法;陈昭炯(1964—),女,福建福州人,教授,硕士,主要研究方向:机器学习;
  • 基金资助:
    福建省科技计划项目(2018H6010)。

Multi-round conversational reinforcement learning recommendation algorithm via multi-granularity feedback

YAO Huayong, YE Dongyi, CHEN Zhaojiong   

  1. College of Computer and Big Data, Fuzhou University, Fuzhou Fujian 350108, China
  • Received:2021-11-09 Revised:2022-05-05 Online:2023-01-12
  • Contact: YE Dongyi, born in 1964, Ph. D., professor. His research interests include machine learning.
  • About author:YAO Huayong, born in 1998, M. S. candidate. His research interests include recommendation algorithm;CHEN Zhaojiong, born in 1964, M. S., professor. Her research interests include machine learning;
  • Supported by:
    This work is partially supported by Fujian Provincial Science and Technology Project (2018H6010).

摘要: 多轮对话推荐系统(CRS)以交互的方式获取用户的实时信息,相较于基于协同过滤等的传统推荐方法能够取得更好的推荐效果。然而现有的CRS存在用户偏好捕获不够准确、对话轮数要求过多以及推荐时机不恰当等问题。针对这些问题,提出一种基于深度强化学习且考虑用户多粒度反馈信息的对话推荐算法。不同于现有的CRS,所提算法在每轮对话中同时考虑用户对商品本身以及更细粒度的商品属性的反馈,然后根据收集的多粒度反馈对用户、商品和商品属性特征进行在线更新,并借助深度Q学习网络(DQN)算法分析每轮对话后的环境状态,从而帮助系统作出较为恰当合理的决策动作,使它能够在比较少的对话轮次的情况下分析用户购买商品的原因,更全面地挖掘用户的实时偏好。与对话路径推理(SCPR)算法相比,在Last.fm真实数据集上,算法的15轮推荐成功率提升了46.5%,15轮推荐轮次上缩短了0.314轮;在Yelp真实数据集上,算法保持了相同水平的推荐成功率,但在15轮推荐轮次上缩短了0.51轮。

关键词: 多轮对话推荐系统, 反馈信息, 深度Q学习网络, 偏好挖掘, 多粒度

Abstract: Multi-round Conversational Recommendation System (CRS) obtains real-time information of users interactively, thus performing better than traditional recommendation methods such as collaborative filtering based method. However, existing CRS suffers from problems inaccurate mining of user preferences, too many conversational rounds required and inappropriate recommendation moments. Aiming at these problems, a new conversational recommendation algorithm based on deep reinforcement learning considering user’s multi-granularity feedback information was proposed. Different from existing CRS, in each conversation, the feedback of users on items themselves and more fine-grained item attributes was considered by the proposed algorithm at the same time. Then, users, items and attribute features of items were updated online by using the collected multi-granularity feedback, and the environment state after each round of conversation was analyzed by Deep Q-Network (DQN) algorithm. As a result, more appropriate and reasonable decisions were made by the system, and the reasons of why user buying items were analyzed and the users’ real-time preferences were mined comprehensively with fewer conversation rounds. Experimental results on two real datasets show that compared with Simple Conversational Path Reasoning (SCPR) algorithm, the proposed algorithm has the 15 turns success rate increased by 46.5%, and the 15 average turns decreased by 0.314 rounds in Last.fm dataset, while it maintains the same level of success rate but the 15 average turns decreased by 0.51 rounds in Yelp dataset.

Key words: multi-round conversational recommendation system, feedback information, Deep Q-Network (DQN), preference mining, multi-granularity

中图分类号: