计算机应用 ›› 2018, Vol. 38 ›› Issue (4): 1001-1006.DOI: 10.11772/j.issn.1001-9081.2017092314

• 数据科学与技术 • 上一篇    下一篇

基于改进聚类和矩阵分解的协同过滤推荐算法

王永贵, 宋真真, 肖成龙   

  1. 辽宁工程技术大学 软件学院, 辽宁 葫芦岛 125105
  • 收稿日期:2017-09-25 修回日期:2017-11-14 出版日期:2018-04-10 发布日期:2018-04-09
  • 通讯作者: 宋真真
  • 作者简介:王永贵(1967-),男,内蒙古宁城人,教授,硕士,CCF会员,主要研究方向:大数据、数据库、数据仓库;宋真真(1989-),女,河南安阳人,硕士研究生,主要研究方向:推荐算法、数据挖掘;肖成龙(1984-),男,湖南祁阳人,副教授,博士,CCF会员,主要研究方向:软硬件协同设计、嵌入式系统、高层次综合。
  • 基金资助:
    国家自然科学基金资助项目(61404069);辽宁省教育厅科学研究项目(LJYL048)。

Collaborative filtering recommendation algorithm based on improved clustering and matrix factorization

WANG Yonggui, SONG Zhenzhen, XIAO Chenglong   

  1. College of Software, Liaoning Technical University, Huludao Liaoning 125105, China
  • Received:2017-09-25 Revised:2017-11-14 Online:2018-04-10 Published:2018-04-09
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61404069), the Liaoning Provincial Department of Education Scientific Research Project (LJYL048).

摘要: 大数据背景下,对于传统的协同过滤推荐算法在电子商务系统中的数据稀疏性、准确性不高、实时性不足等问题,提出一种改进的协同过滤推荐算法。该算法首先通过矩阵分解实现对原始数据的降维及其数据填充,并引入了时间衰减函数预处理用户评分,用项目的属性向量来表征项目,用用户的兴趣向量来表征用户,通过k-means聚类算法对用户和项目分别进行聚类;然后使用改进相似性度量方法在簇中查找用户的最近邻和项目推荐候选集,产生推荐。实验结果表明,该算法不仅可以有效解决数据稀疏和新项目带来的冷启动问题,而且还可以在多维度下反映用户的兴趣变化,推荐算法的准确度明显提升。

关键词: 协同过滤, 聚类, 时间衰变, 兴趣向量, 矩阵分解

Abstract: Concerning data sparseness, low accuracy and poor real-time performance of traditional collaborative filtering recommendation algorithm in e-commerce system under the background of big data, a new collaborative filtering recommendation algorithm based on improved clustering and matrix decomposition was proposed. Firstly, the dimensionality reduction and data filling of the original data were reliazed by matrix decomposition. Then the time decay function was introduced to deal with user score. The attribute vector of a project was used to characterize the project and the interest vector of user was used to characterize the user, then the projects and users were clustered by k-means clustering algorithm. By using the improved similarity measure method, the nearest neighbors and the project recommendation candidate set in the cluster were searched, thus the recommendation was made. Experimental results show that the proposed algorithm can not only solve the problem of sparse data and cold start caused by new projects, but also can reflect the change of user's interest in multi-dimension, and the accuracy of recommendation algorithm is obviously improved.

Key words: collaborative filtering, clustering, time decay, interest vector, matrix factorization

中图分类号: