考虑多粒度反馈的多轮对话强化学习推荐算法

doi:10.11772/j.issn.1001-9081.2021111875

《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (1): 15-21.DOI: 10.11772/j.issn.1001-9081.2021111875

所属专题：人工智能

考虑多粒度反馈的多轮对话强化学习推荐算法

姚华勇, 叶东毅, 陈昭炯

福州大学计算机与大数据学院，福州 350108

收稿日期:2021-11-09 修回日期:2022-05-05 发布日期:2023-01-12
通讯作者: 叶东毅（1964—），男，福建泉州人，教授，博士，主要研究方向：机器学习yiedy@fzu.edu.cn；
作者简介:姚华勇（1998—），男，福建南平人，硕士研究生，主要研究方向：推荐算法；陈昭炯（1964—），女，福建福州人，教授，硕士，主要研究方向：机器学习；
基金资助:
福建省科技计划项目（2018H6010）。

Multi-round conversational reinforcement learning recommendation algorithm via multi-granularity feedback

YAO Huayong, YE Dongyi, CHEN Zhaojiong

College of Computer and Big Data， Fuzhou University， Fuzhou Fujian 350108， China

Received:2021-11-09 Revised:2022-05-05 Online:2023-01-12
Contact: YE Dongyi， born in 1964， Ph. D.， professor. His research interests include machine learning.
About author:YAO Huayong， born in 1998， M. S. candidate. His research interests include recommendation algorithm；CHEN Zhaojiong， born in 1964， M. S.， professor. Her research interests include machine learning；
Supported by:
This work is partially supported by Fujian Provincial Science and Technology Project （2018H6010）.

摘要/Abstract

摘要： 多轮对话推荐系统（CRS）以交互的方式获取用户的实时信息，相较于基于协同过滤等的传统推荐方法能够取得更好的推荐效果。然而现有的CRS存在用户偏好捕获不够准确、对话轮数要求过多以及推荐时机不恰当等问题。针对这些问题，提出一种基于深度强化学习且考虑用户多粒度反馈信息的对话推荐算法。不同于现有的CRS，所提算法在每轮对话中同时考虑用户对商品本身以及更细粒度的商品属性的反馈，然后根据收集的多粒度反馈对用户、商品和商品属性特征进行在线更新，并借助深度Q学习网络（DQN）算法分析每轮对话后的环境状态，从而帮助系统作出较为恰当合理的决策动作，使它能够在比较少的对话轮次的情况下分析用户购买商品的原因，更全面地挖掘用户的实时偏好。与对话路径推理（SCPR）算法相比，在Last.fm真实数据集上，算法的15轮推荐成功率提升了46.5%，15轮推荐轮次上缩短了0.314轮；在Yelp真实数据集上，算法保持了相同水平的推荐成功率，但在15轮推荐轮次上缩短了0.51轮。

Abstract: Multi-round Conversational Recommendation System （CRS） obtains real-time information of users interactively， thus performing better than traditional recommendation methods such as collaborative filtering based method. However， existing CRS suffers from problems inaccurate mining of user preferences， too many conversational rounds required and inappropriate recommendation moments. Aiming at these problems， a new conversational recommendation algorithm based on deep reinforcement learning considering user’s multi-granularity feedback information was proposed. Different from existing CRS， in each conversation， the feedback of users on items themselves and more fine-grained item attributes was considered by the proposed algorithm at the same time. Then， users， items and attribute features of items were updated online by using the collected multi-granularity feedback， and the environment state after each round of conversation was analyzed by Deep Q-Network （DQN） algorithm. As a result， more appropriate and reasonable decisions were made by the system， and the reasons of why user buying items were analyzed and the users’ real-time preferences were mined comprehensively with fewer conversation rounds. Experimental results on two real datasets show that compared with Simple Conversational Path Reasoning （SCPR） algorithm， the proposed algorithm has the 15 turns success rate increased by 46.5%， and the 15 average turns decreased by 0.314 rounds in Last.fm dataset， while it maintains the same level of success rate but the 15 average turns decreased by 0.51 rounds in Yelp dataset.

Key words: multi-round conversational recommendation system, feedback information, Deep Q-Network (DQN), preference mining, multi-granularity

中图分类号:

TP181

姚华勇, 叶东毅, 陈昭炯. 考虑多粒度反馈的多轮对话强化学习推荐算法[J]. 计算机应用, 2023, 43(1): 15-21.

YAO Huayong, YE Dongyi, CHEN Zhaojiong. Multi-round conversational reinforcement learning recommendation algorithm via multi-granularity feedback[J]. Journal of Computer Applications, 2023, 43(1): 15-21.

参考文献

1 RESNICK P， VARIAN H R. Recommender systems［J］. Communications of the ACM， 1997， 40（3）： 56-58. 10.1145/245108.245121
2 GOLDBERG D， NICHOLS D， OKI B M， et al. Using collaborative filtering to weave an information tapestry［J］. Communications of the ACM， 1992， 35（12）： 61-70. 10.1145/138859.138867
3 PAZZANI M J， BILLSUS D. Content-based recommendation systems［M］// BRUSILOVSKY P， KOBSA A， NEJDL W. The Adaptive Web： Methods and Strategies of Web Personalization， LNCS 4321. Berlin： Springer， 2007： 325-341.
4 WANG W， BENBASAT I. Research Note - a contingency approach to investigating the effects of user-system interaction modes of online decision aids［J］. Information Systems Research， 2013， 24（3）： 861-876. 10.1287/isre.1120.0445
5 JANNACH D， MANZOOR A， CAI W， et al. A survey on conversational recommender systems［J］. ACM Computing Surveys， 2021， 54（5）： No.105. 10.1145/3453154
6 GAO C， LEI W， HE X， et al. Advances and challenges in conversational recommender systems： A survey［J］. AI Open， 2021， 2： 100-126. 10.1016/j.aiopen.2021.06.002
7 LI R， EBRAHIMI KAHOU S， SCHULZ H， et al. Towards deep conversational recommendations［C］// Proceedings of the 32nd International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2018： 9748-9758. 10.7551/mitpress/11474.003.0014
8 ZHANG Y， CHEN X， AI Q， et al. Towards conversational search and recommendation： System ask， user respond［C］// Proceedings of the 27th ACM International Conference on Information and Knowledge Management. New York： ACM， 2018： 177-186. 10.1145/3269206.3271776
9 LIAO L， MA Y， HE X， et al. Knowledge-aware multimodal dialogue systems［C］// Proceedings of the 26th ACM International Conference on Multimedia. New York： ACM， 2018： 801-809. 10.1145/3240508.3240605
10 CHRISTAKOPOULOU K， RADLINSKI F， HOFMANN K. Towards conversational recommender systems［C］// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2016： 815-824. 10.1145/2939672.2939746
11 ZHOU K， ZHOU Y， ZHAO W X， et al. Towards topic-guided conversational recommender system［EB/OL］. ［2020-10-08］. https：//arxiv.org/pdf/2010.04125. 10.18653/v1/2020.coling-main.365
12 DHINGRA B， LI L， LI X， et al. Towards end-to-end reinforcement learning of dialogue agents for information access［EB/OL］. ［2021-09-13］. https：//arxiv.org/pdf/1609.00777. 10.18653/v1/p17-1045
13 SUN Y， ZHANG Y. Conversational recommender system［C］// Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 2018： 235-244. 10.1145/3209978.3210002
14 LEI W， HE X， MIAO Y， et al. Estimation-action-reflection： Towards deep interaction between conversational and recommender systems［C］// Proceedings of the 13th International Conference on Web Search and Data Mining. New York： ACM， 2020： 304-312. 10.1145/3336191.3371769
15 MNIH V， KAVUKCUOGLU K， SILVER D， et al. Playing atari with deep reinforcement learning［EB/OL］. （2013-12-19）［2021-07-22］. https：//arxiv.org/pdf/1312.5602. 10.1038/nature14236
16 HOSU I A， REBEDEA T. Playing atari games with deep reinforcement learning and human checkpoint replay［EB/OL］. （2016-07-18）［2021-09-02］. https：//arxiv.org/pdf/1607.05077.
17 RENDLE S， FREUDENTHALER C， GANTNER Z， et al. BPR： Bayesian personalized ranking from implicit feedback［C］// Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence. Arlington， VA： AUAI Press， 2009： 452-461.
18 RUDER S. An overview of gradient descent optimization algorithms［EB/OL］. （2017-06-15）［2021-08-17］.https：//arxiv.org/pdf/1609.04747.pdf. 10.1017/9781108699211.008
19 LEI W， ZHANG G， HE X， et al. Interactive path reasoning on graph for conversational recommendation［C］// Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York： ACM， 2020： 2073-2083. 10.1145/3394486.3403258

[1]	张晓燕, 王佳一. 属性聚类下三支概念的对比[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1336-1341.
[2]	胡军, 许正康, 刘立, 钟福金. 融合多粒度社区信息的网络嵌入方法[J]. 《计算机应用》唯一官方网站, 2022, 42(3): 663-670.
[3]	耿艳兵, 廉永健. 基于多粒度特征生成对抗网络的跨分辨率行人重识别[J]. 《计算机应用》唯一官方网站, 2022, 42(11): 3573-3579.
[4]	卞凌志, 王直杰. 基于增强多维多粒度级联森林的信用评分模型[J]. 计算机应用, 2021, 41(9): 2539-2544.
[5]	孟凡, 陈广, 王勇, 高阳, 高德群, 贾文龙. 基于多粒度时序结构表示的异常检测算法在储层含油性检测中应用[J]. 计算机应用, 2021, 41(8): 2453-2459.
[6]	王鹏, 李艳雯, 杨迪, 杨华民. 基于层级控制的宏观基本图交通信号控制模型[J]. 计算机应用, 2021, 41(2): 571-576.
[7]	任俊伟, 曾诚, 肖丝雨, 乔金霞, 何鹏. 基于会话的多粒度图神经网络推荐模型[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3164-3170.
[8]	徐怡, 肖鹏. 基于容差关系的多粒度粗糙集中近似集动态更新方法[J]. 计算机应用, 2019, 39(5): 1247-1251.
[9]	郑文彬, 李进金, 于佩秋, 林艺东. 变精度多粒度粗糙集近似集更新的矩阵算法[J]. 计算机应用, 2019, 39(11): 3140-3145.
[10]	翁理国, 刘万安, 施必成, 夏旻. 基于多维多粒度级联森林的高原地区云雪分类[J]. 计算机应用, 2018, 38(8): 2218-2223.
[11]	万志超, 宋杰, 沈永良. 可变直觉模糊多粒度粗糙集模型及其近似分布约简算法[J]. 计算机应用, 2018, 38(2): 390-398.
[12]	康照玲, 徐芹宝, 王昌达. 基于多粒度拓扑图的无线传感器网络逐级精化溯源方法[J]. 计算机应用, 2018, 38(1): 222-227.
[13]	胡善忠, 徐怡, 何明慧, 王冉. 多粒度粗糙集粒度约简的高效算法[J]. 计算机应用, 2017, 37(12): 3391-3396.
[14]	谭征, 刘惊雷, 余航. 基于最大团的条件偏好挖掘[J]. 计算机应用, 2017, 37(11): 3107-3114.
[15]	姚晟, 徐风, 汪杰. 多粒度粗糙集模型中属性子集序列的构造方法[J]. 计算机应用, 2016, 36(11): 2950-2953.

考虑多粒度反馈的多轮对话强化学习推荐算法

Multi-round conversational reinforcement learning recommendation algorithm via multi-granularity feedback

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics