计算机应用 ›› 2018, Vol. 38 ›› Issue (3): 633-638.DOI: 10.11772/j.issn.1001-9081.2017071718

• 人工智能 • 上一篇    下一篇

基于多层次混合相似度的协同过滤推荐算法

袁正午, 陈然   

  1. 重庆邮电大学 计算机科学与技术学院, 重庆 400065
  • 收稿日期:2017-07-14 修回日期:2017-09-10 出版日期:2018-03-10 发布日期:2018-03-07
  • 通讯作者: 陈然
  • 作者简介:袁正午(1968-),男,湖南益阳人,教授,博士,CCF会员,主要研究方向:遥感技术、大数据、云计算;陈然(1992-),男,湖北荆州人,硕士研究生,主要研究方向:数据挖掘、推荐算法。
  • 基金资助:
    国家自然科学基金资助项目(61471077);长江学者和创新团队发展计划项目(IRT1299)。

Collaborative filtering recommendation algorithm based on multi-level hybrid similarity

YUAN Zhengwu, CHEN Ran   

  1. College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
  • Received:2017-07-14 Revised:2017-09-10 Online:2018-03-10 Published:2018-03-07
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61471077), the Program for Changjiang Scholars and Innovative Research Teams in Universities (IRT1299).

摘要: 针对传统协同过滤推荐算法在数据稀疏的情况下存在的性能缺陷和相似性度量方法的不足,为了提高推荐精度,改进原算法得到了一种基于多层次混合相似度的协同过滤推荐算法。该算法主要分为三个不同的层次:首先采用模糊集的概念将用户评分模糊化,计算用户的模糊偏好,并结合用户评分的修正余弦相似度和用户评分的Jarccad相似度总体作为用户评分相似度;再对用户评分进行分类来预测用户对项目类别的兴趣程度,从而计算出用户兴趣相似度;然后利用用户的特征属性来预测用户之间的特征相似度;其次根据用户评分数量来动态地融合用户兴趣相似度及用户特征相似度;最后融合三个层次的相似度作为用户混合相似度的结果。利用MovieLens公用数据集对改进前后的算法进行对比实验,结果表明:当在邻居集合数量较少时,改进的混合算法相对修正余弦相似度算法的平均绝对偏差(MAE)下降了5%左右;较改进的修正的Jaccard相似性系数的协同过滤(MKJCF)算法也存在略微的优势,随着邻居集合数的增加MAE也平均下降了1%左右。该算法采用多层次的推荐策略提高了用户的推荐精度,有效地缓解了数据稀疏性问题和单一度量方法的影响。

关键词: 协同过滤, 数据稀疏性, 模糊集, 评分相似度, 兴趣相似度, 特征相似度

Abstract: In view of performance flaws in the case of sparse data and the lack of similarity measurement methods in traditional collaborative filtering recommendation algorithm, a collaborative filtering recommendation algorithm based on multi-level hybrid similarity was proposed to improve the recommendation accuracy. The algorithm is divided into three different levels. Firstly, the concept of fuzzy set was used to fuzzify the user rating and then to calculate the user's fuzzy preferences, and the adjusted cosine-based similarity of the user rating and the Jarccad similarity of the user rating were combined as the user rating similarity. Secondly, the use rating was classified to predict the degree of interest of the user to the item category so that the user's interest similarity was calculated. Thirdly, the user's characteristic similarity was predicted by the characteristic attributes between users. Then, the user's interest similarity and user's characteristic similarity were dynamically integrated by the number of user ratings. Finally, the similarities of three levels were fused as the result of user similarity. The experimental results show that the improved hybrid algorithm has a decrease of 5% in Mean Absolute Error (MAE) compared to the adjusted cosine-based similarity algorithm when the number of neighbors is small. Compared with the improved MKJCF (Modified K-pow Jaccard similarity Cooperative Filtering) algorithm, the improved hybrid algorithm has a slight advantage, and the MAE fell by an average of about 1% with the increase of neighbor number. The proposed algorithm uses a multi-level recommendation strategy to improve the user's recommendation accuracy, effectively alleviates the sparseness of data and the impact of single measurement method.

Key words: collaborative filtering, data sparseness, fuzzy set, rating similarity, interest similarity, characteristic similarity

中图分类号: