计算机应用 ›› 2016, Vol. 36 ›› Issue (7): 2006-2010.DOI: 10.11772/j.issn.1001-9081.2016.07.2006

• 大数据 • 上一篇    下一篇

基于巴氏系数和Jaccard系数的协同过滤算法

杨家慧1,2, 刘方爱1,2   

  1. 1. 山东师范大学 信息科学与工程学院, 济南 250014;
    2. 山东省分布式计算机软件新技术重点实验室(山东师范大学), 济南 250014
  • 收稿日期:2015-12-28 修回日期:2016-03-14 出版日期:2016-07-10 发布日期:2016-07-14
  • 通讯作者: 刘方爱
  • 作者简介:杨家慧(1991-),女,山东泰安人,硕士研究生,CCF会员,主要研究方向:数据挖掘、个性化推荐;刘方爱(1962-),男,山东青岛人,教授,博士生导师,博士,主要研究方向:数据挖掘、个性化推荐、分布式计算。
  • 基金资助:
    国家自然科学基金资助项目(61572301,90612003);山东省自然科学基金资助项目(ZR2013FM008)。

Collaborative filtering algorithm based on Bhattacharyya coefficient and Jaccard coefficient

YANG Jiahui1,2, LIU Fangai1,2   

  1. 1. College of Information Science and Engineering, Shandong Normal University, Jinan Shandong 250014, China;
    2. Shandong Provincial Key Laboratory for Distributed Computer Software Novel Technology (Shandong Normal University), Jinan Shandong 250014, China
  • Received:2015-12-28 Revised:2016-03-14 Online:2016-07-10 Published:2016-07-14
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61572301, 90612003), Shandong Provincial Natural Science Foundation (ZR2013FM008).

摘要: 针对传统基于邻域的协同过滤推荐算法存在数据稀疏性及相似性度量只能利用用户共同评分的问题,提出一种基于巴氏系数和Jaccard系数的协同过滤算法(CFBJ)。在项目相似性度量中,该算法引入巴氏系数和Jaccard系数,巴氏系数利用用户所有评分信息克服共同评分的限制,Jaccard系数可以增加相似性度量中共同评分项所占的比重。该算法通过提高项目相似度准确率来选取最近邻,优化了对目标用户的偏好预测和个性化推荐。实验结果表明,该算法比平均值-杰卡德差分(MJD)算法、皮尔森系数(PC)算法、杰卡德均方差(JMSD)算法、PIP算法误差更小,分类准确率更高,有效缓解了用户评分数据稀疏所带来的问题,提高了推荐系统的预测准确率。

关键词: 协同过滤, 巴氏系数, 杰卡德系数, 相似性度量, 矩阵稀疏性

Abstract: The traditional collaborative filtering recommendation algorithm based on neighborhood has problems of data sparsity and similarity measures only utilizing ratings of co-rated items, so a Collaborative Filtering algorithm based on Bhattacharyya coefficient and Jaccard coefficient (CFBJ) was proposed. The similarity was measured by introducing Bhattacharyya coefficient and Jaccard coefficient. Bhattacharyya coefficient could utilize all ratings made by a pair of users to get rid of common rating restrictions. Jaccard coefficient could increase the proportion of common items in similarity measurement. The nearest neighborhood was selected by improving the accuracy of item similarity and the preference prediction and personalized recommendation of the active users were optimized. The experimental results show that the proposed algorithm has smaller error and higher classification accuracy than algorithms of Mean Jaccard Difference (MJD), Pearson Correlation (PC), Jaccard and Mean Squared Different (JMSD) and PIP (Proximity-Impact-Popularity). It effectively alleviates the data sparsity problem and enhances the accuracy of recommendation system.

Key words: collaborative filtering, Bhattacharyya coefficient, Jaccard coefficient, similarity measurement, matrix sparsity

中图分类号: