基于巴氏系数和Jaccard系数的协同过滤算法

doi:10.11772/j.issn.1001-9081.2016.07.2006

计算机应用 ›› 2016, Vol. 36 ›› Issue (7): 2006-2010.DOI: 10.11772/j.issn.1001-9081.2016.07.2006

基于巴氏系数和Jaccard系数的协同过滤算法

杨家慧^1,2, 刘方爱^1,2

1. 山东师范大学信息科学与工程学院, 济南 250014;
2. 山东省分布式计算机软件新技术重点实验室(山东师范大学), 济南 250014

收稿日期:2015-12-28 修回日期:2016-03-14 出版日期:2016-07-10 发布日期:2016-07-14
通讯作者: 刘方爱
作者简介:杨家慧(1991-),女,山东泰安人,硕士研究生,CCF会员,主要研究方向:数据挖掘、个性化推荐;刘方爱(1962-),男,山东青岛人,教授,博士生导师,博士,主要研究方向:数据挖掘、个性化推荐、分布式计算。
基金资助:
国家自然科学基金资助项目（61572301，90612003）；山东省自然科学基金资助项目（ZR2013FM008）。

Collaborative filtering algorithm based on Bhattacharyya coefficient and Jaccard coefficient

YANG Jiahui^1,2, LIU Fangai^1,2

1. College of Information Science and Engineering, Shandong Normal University, Jinan Shandong 250014, China;
2. Shandong Provincial Key Laboratory for Distributed Computer Software Novel Technology (Shandong Normal University), Jinan Shandong 250014, China

Received:2015-12-28 Revised:2016-03-14 Online:2016-07-10 Published:2016-07-14
Supported by:
This work is partially supported by the National Natural Science Foundation of China (61572301, 90612003), Shandong Provincial Natural Science Foundation (ZR2013FM008).

摘要/Abstract

摘要： 针对传统基于邻域的协同过滤推荐算法存在数据稀疏性及相似性度量只能利用用户共同评分的问题，提出一种基于巴氏系数和Jaccard系数的协同过滤算法（CFBJ）。在项目相似性度量中，该算法引入巴氏系数和Jaccard系数，巴氏系数利用用户所有评分信息克服共同评分的限制，Jaccard系数可以增加相似性度量中共同评分项所占的比重。该算法通过提高项目相似度准确率来选取最近邻，优化了对目标用户的偏好预测和个性化推荐。实验结果表明，该算法比平均值-杰卡德差分（MJD）算法、皮尔森系数（PC）算法、杰卡德均方差（JMSD）算法、PIP算法误差更小，分类准确率更高，有效缓解了用户评分数据稀疏所带来的问题，提高了推荐系统的预测准确率。

关键词: 协同过滤, 巴氏系数, 杰卡德系数, 相似性度量, 矩阵稀疏性

Abstract: The traditional collaborative filtering recommendation algorithm based on neighborhood has problems of data sparsity and similarity measures only utilizing ratings of co-rated items, so a Collaborative Filtering algorithm based on Bhattacharyya coefficient and Jaccard coefficient (CFBJ) was proposed. The similarity was measured by introducing Bhattacharyya coefficient and Jaccard coefficient. Bhattacharyya coefficient could utilize all ratings made by a pair of users to get rid of common rating restrictions. Jaccard coefficient could increase the proportion of common items in similarity measurement. The nearest neighborhood was selected by improving the accuracy of item similarity and the preference prediction and personalized recommendation of the active users were optimized. The experimental results show that the proposed algorithm has smaller error and higher classification accuracy than algorithms of Mean Jaccard Difference (MJD), Pearson Correlation (PC), Jaccard and Mean Squared Different (JMSD) and PIP (Proximity-Impact-Popularity). It effectively alleviates the data sparsity problem and enhances the accuracy of recommendation system.

Key words: collaborative filtering, Bhattacharyya coefficient, Jaccard coefficient, similarity measurement, matrix sparsity

中图分类号:

TP301.6

杨家慧, 刘方爱. 基于巴氏系数和Jaccard系数的协同过滤算法[J]. 计算机应用, 2016, 36(7): 2006-2010.

YANG Jiahui, LIU Fangai. Collaborative filtering algorithm based on Bhattacharyya coefficient and Jaccard coefficient[J]. Journal of Computer Applications, 2016, 36(7): 2006-2010.

参考文献

[1] HERLOCKER J L, KONSTAN J A, TERVEEN L G, et al. Evaluating collaborative filtering recommender systems[J]. ACM Transactions on Information Systems, 2004, 22(1):5-53.
[2] SARWAR B, KARYPIS G, KONSTAN J, et al. Item-based collaborative filtering recommendation algorithms[C]//Proceedings of the 10th International Conference on World Wide Web. New York:ACM, 2001:285-295.
[3] GONG S. A collaborative filtering recommendation algorithm based on user clustering and item clustering[J]. Journal of Software, 2010, 5(7):745-752.
[4] DESHPANDE M, KARYPIS G. Item-based top-n recommendation algorithms[J]. ACM Transactions on Information Systems, 2004, 22(1):143-177.
[5] HUANG Z, CHEN H, ZENG D. Applying associative retrieval techniques to alleviate the sparsity problem in collaborative filtering[J]. ACM Transactions on Information Systems, 2004, 22(1):116-142.
[6] ADLER J, PARMRYD I. Quantifying colocalization by correlation:the Pearson correlation coefficient is superior to the Mander's overlap coefficient[J]. Cytometry Part A, 2010, 77(8):733-742.
[7] ANAND S S, MOBASHER B. Intelligent techniques for Web personalization[C]//Proceedings of the 2003 International Conference on Intelligent Techniques for Web Personalization. Berlin:Springer, 2003:1-36.
[8] 黄创光,印鉴,汪静,等.不确定近邻的协同过滤推荐算法[J].计算机学报,2010,33(8):1369-1377.(HUANG C G, YIN J, WANG J, et al. Uncertain neighbors' collaborative filtering recommendation algorithm[J]. Chinese Journal of Computers, 2010, 33(8):1369-1377.)
[9] LUO H, NIU C, SHEN R, et al. A collaborative filtering framework based on both local user similarity and global user similarity[J]. Machine Learning, 2008, 72(3):231-245.
[10] AHN H J. A new similarity measure for collaborative filtering to alleviate the new user cold-starting problem[J]. Information Sciences, 2008, 178(1):37-51.
[11] HERLOCKER J L, KONSTAN J A, BORCHERS A, et al. An algorithmic framework for performing collaborative filtering[C]//Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York:ACM, 1999:230-237.
[12] JAMALI M, ESTER M. Trustwalker:a random walk model for combining trust-based and item-based recommendation[C]//Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM, 2009:397-406.
[13] BOBADILLA J, ORTEGA F, HERNANDO A, et al. A similarity metric designed to speed up, using hardware, the recommender systems k-nearest neighbors algorithm[J]. Knowledge-Based Systems, 2013, 51:27-34.
[14] BOBADILLA J, ORTEGA F, HERNANDO A. A collaborative filtering similarity measure based on singularities[J]. Information Processing & Management, 2012, 48(2):204-217.
[15] PATRA B K, LAUNONEN R, OLLIKAINEN V, et al. Exploiting Bhattacharyya similarity measure to diminish user cold-start problem in sparse data[M]//Discovery Science. Berlin:Springer, 2014:252-263.
[16] KAILATH T. The divergence and Bhattacharyya distance measures in signal selection[J]. IEEE Transactions on Communication Technology, 1967, 15(1):52-60.
[17] JAIN A K. On an estimate of the Bhattacharyya distance[J]. IEEE Transactions on Systems Man & Cybernetics, 1976, SMC-6(11):763-766.
[18] BOBADILLA J, ORTEGA F, HERNANDO A, et al. A collaborative filtering approach to mitigate the new user cold start problem[J]. Knowledge-Based Systems, 2012, 26:225-238.
[19] BREESE J S, HECKERMAN D, KADIE C. Empirical analysis of predictive algorithms for collaborative filtering[C]//Proceedings of the Conference on Uncertainty in Artificial Intelligence. San Francisco:Morgan Kaufmann, 1998:43-52.
[20] BOBADILLA J, SERRADILLA F, BERNAL J. A new collaborative filtering metric that improves the behavior of recommender systems[J]. Knowledge-Based Systems, 2010, 23(6):520-528.

基于巴氏系数和Jaccard系数的协同过滤算法

Collaborative filtering algorithm based on Bhattacharyya coefficient and Jaccard coefficient

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	包玄, 陈红梅, 肖清. 融入时间的兴趣点协同推荐算法[J]. 计算机应用, 2021, 41(8): 2406-2411.
[2]	杨蒙蒙, 张爱华. 基于灰度共生矩阵和同步正交匹配追踪的分形图像压缩[J]. 计算机应用, 2021, 41(5): 1445-1449.
[3]	胡立华, 左威健, 聂瑶瑶. 基于加权相似性度量的特征匹配方法[J]. 计算机应用, 2021, 41(2): 511-516.
[4]	周玉彬, 肖红, 王涛, 姜文超, 熊梦, 贺忠堂. 基于动作周期退化相似性度量的机械轴健康指标构建与剩余寿命预测[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3192-3199.
[5]	李翔锟, 贾彩燕. 融合重叠社区正则化及隐式反馈的协同过滤方法[J]. 计算机应用, 2021, 41(1): 53-59.
[6]	田保军, 刘爽, 房建东. 融合主题信息和卷积神经网络的混合推荐算法[J]. 计算机应用, 2020, 40(7): 1901-1907.
[7]	陈曦, 梅广, 张金金, 许维胜. 融合知识图谱和协同过滤的学生成绩预测方法[J]. 计算机应用, 2020, 40(2): 595-601.
[8]	张文龙, 钱付兰, 陈洁, 赵姝, 张燕平. 基于双重最相关注意力网络的协同过滤推荐算法[J]. 计算机应用, 2020, 40(12): 3445-3450.
[9]	马伟苹, 李文新, 孙晋川, 曹鹏霞. 基于粗精立体匹配的双目视觉目标定位方法[J]. 计算机应用, 2020, 40(1): 227-232.
[10]	章永来, 周耀鉴. 聚类算法综述[J]. 计算机应用, 2019, 39(7): 1869-1882.
[11]	樊玮, 谢聪, 肖春景, 曹淑燕. 基于组合类别空间的随机游走推荐算法[J]. 计算机应用, 2019, 39(4): 984-988.
[12]	姜逸凡, 叶青. 基于孪生神经网络的时间序列相似性度量[J]. 计算机应用, 2019, 39(4): 1041-1045.
[13]	雷曼, 龚琴, 王纪超, 王保群. 基于标签权重的协同过滤推荐算法[J]. 计算机应用, 2019, 39(3): 634-638.
[14]	许朝, 孟凡荣, 袁冠, 李月娥, 刘肖. 融合地点影响力的兴趣点推荐算法[J]. 计算机应用, 2019, 39(11): 3178-3183.
[15]	刘彤, 曾诚, 何鹏. 基于用户网络嵌入的民宿房源推荐方法[J]. 计算机应用, 2019, 39(11): 3398-3402.