计算机应用 ›› 2017, Vol. 37 ›› Issue (9): 2665-2670.DOI: 10.11772/j.issn.1001-9081.2017.09.2665

• 数据科学与技术 • 上一篇    下一篇

基于数据降维与精确欧氏局部敏感哈希的k近邻推荐方法

郭喻栋, 郭志刚, 陈刚, 魏晗   

  1. 信息工程大学 信息系统工程学院, 郑州 450002
  • 收稿日期:2017-03-22 修回日期:2017-05-22 出版日期:2017-09-10 发布日期:2017-09-13
  • 通讯作者: 郭志刚,20374042@qq.com
  • 作者简介:郭喻栋(1991-),男,河南孟津人,硕士研究生,主要研究方向:机器学习、数据挖掘;郭志刚(1973-),男,河南郑州人,副教授,硕士,主要研究方向:智能信息处理、机器学习、数据挖掘;陈刚(1979-),男,湖北黄冈人,讲师,博士研究生,主要研究方向:智能信息处理、机器学习、数据挖掘;魏晗(1981-),女,河南郑州人,讲师,硕士,主要研究方向:智能信息处理、机器学习、数据挖掘。
  • 基金资助:
    国家社会科学基金资助项目(14BXW028)。

Recommendation method based on k nearest neighbors using data dimensionality reduction and exact Euclidean locality-sensitive hashing

GUO Yudong, GUO Zhigang, CHEN Gang, WEI Han   

  1. College of Information System Engineering, Information Engineering University, Zhengzhou Henan 450002, China
  • Received:2017-03-22 Revised:2017-05-22 Online:2017-09-10 Published:2017-09-13
  • Supported by:
    This work is partially supported by the National Social Science Foundation of China (14BXW028).

摘要: 针对基于k近邻的协同过滤推荐算法中存在的评分特征数据维度过高、k近邻查找速度慢,以及评分冷启动等问题,提出基于数据降维与精确欧氏局部敏感哈希(E2LSH)的k近邻协同过滤推荐算法。首先,融合评分数据、用户属性数据以及项目类别数据,将融合后的数据作为输入对堆叠降噪自编码(SDA)神经网络进行训练,取神经网络编码部分最后一个隐层的值作为输入数据的特征编码,完成非线性降维。然后,利用精确欧氏局部敏感哈希算法对降维后的数据建立索引,通过检索得到目标用户或目标项目的相似近邻。最后,计算目标与近邻之间的相似度,利用相似度对近邻的评分记录加权得到目标用户对目标项目的预测评分。在标准数据集上的实验结果表明,在冷启动场景下,均方根误差比基于局部敏感哈希的推荐算法(LSH-ICF)平均降低了约7.2%,平均运行时间和LSH-ICF相当。表明该方法在保证推荐效率的前提下,缓解了评分冷启动问题。

关键词: 信息推荐, 堆叠降噪自编码器, 精确欧氏局部敏感哈希, 数据降维, 冷启动

Abstract: There are several problems in the recommendation method based on k nearest neighbors, such as high dimensionality of rating features, slow speed of searching nearest neighbors and cold start problem of ratings. To solve these problems, a recommendation method based on k nearest neighbors using data dimensionality reduction and Exact Euclidean Locality-Sensitive Hashing (E2LSH) was proposed. Firstly, the rating data, the user attribute data and the item category data were integrated as the input data to train the Stack Denoising Auto-encoder (SDA) neutral network, of which the last hidden layer values were used as the feature coding of the input data to complete data dimensionality reduction. Then, the index of the reduced dimension data was built by the Exact Euclidean Local-Sensitive Hash algorithm, and the target users or the target items were retrieved to get their similar nearest neighbors. Finally, the similarities between the target and the neighbors were calculated, and the target user's similarity-weighted prediction rating for the target item was obtained. The experimental results on standard data sets show that the mean square error of the proposed method is reduced by an average of about 7.2% compared with the recommendation method based on Locality-Sensitive Hashing (LSH-ICF), and the average run time of the proposed method is the same as LSH-ICF. It shows that the proposed method alleviates the rating cold start problem on the premiss of keeping the efficiency of LSH-ICF.

Key words: information recommendation, Stack Denoising Auto-encoder (SDA), Exact Euclidean Locality-Sensitive Hashing (E2LSH), data dimensionality reduction, cold start

中图分类号: