Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (9): 2613-2621.DOI: 10.11772/j.issn.1001-9081.2019112002

• Data science and technology • Previous Articles     Next Articles

Commodity recommendation model based on improved deep Q network structure

FU Kui, LIANG Shaoqing, LI Bing   

  1. School of Economics, Wuhan University of Technology, Wuhan Hubei 430070, China
  • Received:2019-11-25 Revised:2020-01-12 Online:2020-09-10 Published:2020-09-15
  • Supported by:
    This work is partially supported by Humanities and Social Sciences Research Foundation of the Ministry of Education of China (17YJA870006).


傅魁, 梁少晴, 李冰   

  1. 武汉理工大学 经济学院, 武汉 430070
  • 通讯作者: 梁少晴
  • 作者简介:傅魁(1977-),男,湖北武汉人,副教授,博士,主要研究方向:智能推荐、数据挖掘、量化投资;梁少晴(1996-),男,安徽亳州人,硕士研究生,主要研究方向:智能推荐;李冰(1983-),女,吉林通化人,副教授,博士,主要研究方向:特征选择、模式识别、复杂网络、智能规划。
  • 基金资助:

Abstract: Traditional recommendation methods have problems such as data sparsity and poor feature recognition. To solve these problems, positive and negative feedback datasets with time-series property were constructed according to implicit feedback. Since positive and negative feedback datasets and commodity purchases have strong time-series feature, Long Short-Term Memory (LSTM) network was introduced as the component of the model. Considering that the user’s own characteristics and action selection returns are determined by different input data, the deep Q network based on competitive architecture was improved: integrating the user positive and negative feedback and the time-series features of commodity purchases, a commodity recommendation model based on the improved deep Q network structure was designed. In the model, the positive and negative feedback data were trained differently, and the time-series features of the commodity purchases were extracted. On the Retailrocket dataset, compared with the best performance among the Factorization Machine (FM) model, W&D (Wide & Deep learning) and Collaborative Filtering (CF) models, the proposed model has the precision, recall, Mean Average Precision (MAP) and Normalized Discounted Cumulative Gain (NDCG) increased by 158.42%, 89.81%, 95.00% and 65.67%. At the same time, DBGD (Dueling Bandit Gradient Descent) was used as the exploration method, so as to improve the low diversity problem of recommended commodities.

Key words: deep reinforcement learning, positive and negative feedback dataset, competitive network architecture, Long Short-Term Memory (LSTM) network, commodity recommendation

摘要: 传统推荐方法存在数据稀疏和特征识别差等问题,为了解决这些问题,根据隐式反馈构建具有时序性的正负反馈数据集。由于正负反馈数据集和商品购买具有强时序性特征,引入长短期记忆(LSTM)网络作为模型构件。考虑用户自身特征和用户动作选择回报由不同的输入数据决定,对竞争架构的深度Q网络进行改进,融合用户正负反馈和商品购买时序性,设计了基于改进的深度Q网络结构的商品推荐模型。模型对正负反馈数据进行区分性训练,对商品购买的时序性特征进行提取。在Retailrocket数据集上,与因子分解机(FM)模型、W&D模型和协同过滤(CF)模型中表现最好的相比,所提模型的准确率、召回率、平均准确率(MAP)和归一化折损累计增益(NDCG)分别提高了158.42%、89.81%、95.00%和67.57%。同时,使用DBGD作为探索方法,改善了推荐商品多样性低的缺陷。

关键词: 深度强化学习, 正负反馈数据集, 竞争网络架构, 长短期记忆网络, 商品推荐

CLC Number: