基于聚类和Spark框架的加权Slope One算法

doi:10.11772/j.issn.1001-9081.2017.05.1287

计算机应用 ›› 2017, Vol. 37 ›› Issue (5): 1287-1291.DOI: 10.11772/j.issn.1001-9081.2017.05.1287

基于聚类和Spark框架的加权Slope One算法

李淋淋¹, 倪建成², 于苹苹¹, 姚彬修¹, 曹博¹

1. 曲阜师范大学信息科学与工程学院, 山东日照 276826;
2. 曲阜师范大学软件学院, 山东曲阜 273165

收稿日期:2016-09-30 修回日期:2016-12-07 出版日期:2017-05-10 发布日期:2017-05-16
通讯作者: 倪建成
作者简介:李淋淋(1991-),女,山东德州人,硕士研究生,CCF会员,主要研究方向:并行与分布式计算、数据挖掘;倪建成(1971-),男,山东济宁人,教授,博士,CCF会员,主要研究方向:分布式计算、机器学习;数据挖掘;于苹苹(1991-),女,山东济南人,硕士研究生,CCF会员,主要研究方向:分布式计算、数据挖掘;姚彬修(1991-),男,山东潍坊人,硕士研究生,CCF会员,主要研究方向:分布式计算、数据挖掘、微博推荐;曹博(1992-),女,黑龙江伊春人,硕士研究生,CCF会员,主要研究方向:并行与分布式计算、数据挖掘。
基金资助:
国家自然科学基金青年基金资助项目（61402258）；山东省本科高校教学改革研究项目（2015M102）；校级教学改革研究项目（jg05021*）。

Weighted Slope One algorithm based on clustering and Spark framework

LI Linlin¹, NI Jiancheng², YU Pingping¹, YAO Binxiu¹, CAO Bo¹

1. College of Information Science and Engineering, Qufu Normal University, Rizhao Shandong 276826, China;
2. College of Software Engineering, Qufu Normal University, Qufu Shandong 273165, China

Received:2016-09-30 Revised:2016-12-07 Online:2017-05-10 Published:2017-05-16
Supported by:
This work is partially supported by the National Natural Science Foundation of China (the Youth Fund) (61402258), the Research Project of Teaching Reform in Undergraduate Colleges and Universities in Shandong Province (2015M102), the Research Project of Teaching Reform in Qufu Normal Universities (jg05021*).

摘要/Abstract

摘要： 针对传统Slope One算法在相似性计算时未考虑项目属性信息和时间因素对项目相似性计算的影响，以及推荐在当前大数据背景下面临的计算复杂度高、处理速度慢的问题，提出了一种基于聚类和Spark框架的加权Slope One算法。首先，将时间权重加入到传统的项目评分相似性计算中，并引入项目属性相似性生成项目综合相似度；然后，结合Canopy-K-means聚类算法生成最近邻居集；最后，利用Spark计算框架对数据进行分区迭代计算，实现该算法的并行化。实验结果表明，基于Spark框架的改进算法与传统Slope One算法、基于用户相似性的加权Slope One算法相比，评分预测准确性更高，较Hadoop平台下的运行效率平均可提高3.5~5倍，更适合应用于大规模数据集的推荐。

关键词: Slope One算法, 聚类, Spark, 时间权重, 项目属性

Abstract: In view of that the traditional Slope One algorithm does not consider the influence of project attribute information and time factor on project similarity calculation, and there exists high computational complexity and slow processing in current large data background, a weighted Slope One algorithm based on clustering and Spark framework was put forward. Firstly, the time weight was added to the traditional item score similarity calculation, and comprehensive similarity was computed with the similarities of the item attributes. And then the set of nearest neighbors was generated through combining with the Canopy-K-means algorithm. Finally, the data was partitioned and iterated to realize parallelization by Spark framework. The experimental results show that the improved algorithm based on the Spark framework is more accurate than the traditional Slope One algorithm and the Slope One algorithm based on user similarity, which can improve the operating efficiency by 3.5-5 times compared with the Hadoop platform, and is more suitable for large-scale dataset recommendation.

Key words: Slope One algorithm, clustering, Spark, time weight, item attribute

中图分类号:

TP391

李淋淋, 倪建成, 于苹苹, 姚彬修, 曹博. 基于聚类和Spark框架的加权Slope One算法[J]. 计算机应用, 2017, 37(5): 1287-1291.

LI Linlin, NI Jiancheng, YU Pingping, YAO Binxiu, CAO Bo. Weighted Slope One algorithm based on clustering and Spark framework[J]. Journal of Computer Applications, 2017, 37(5): 1287-1291.

参考文献

[1] LEMIRE D, MACLACHLAN A. Slope One predictors for online rating-based collaborative filtering[EB/OL].[2016-10-20]. https://core.ac.uk/download/pdf/2423561.pdfg.
[2] 刘林静, 楼文高, 冯国珍. 基于用户相似性的加权Slope One算法[J]. 计算机应用研究, 2016,33(9):2708-2711.(LIU L J,LOU W G,FENG G Z. New weighted Slope One algorithm based on user similarity[J]. Application Research of Computers, 2016, 33(9):2708-2711.)
[3] ZHANG Z, TANG X, CHEN D. Applying user-favorite-item-based similarity into Slope One scheme for collaborative filtering[C]//Proceedings of the 2014 World Congress on Computing and Communication Technologies. Washington, DC:IEEE Computer Society, 2014:5-7.
[4] ZHAO Z, LI J. Based on Slope-One hybrid recommendation[C]//Proceedings of the 2014 IEEE Workshop on Advanced Research and Technology in Industry Applications. Piscataway, NJ:IEEE, 2014:203-205.
[5] YOU H, LI H, WANG Y, et al. An improved collaborative filtering recommendation algorithm combining item clustering and slope one scheme[J]. Lecture Notes in Engineering & Computer Science, 2015, 2215(1):313-316.
[6] YING Y, CAO Y. Collaborative filtering recommendation combining FCM and Slope One algorithm[C]//Proceedings of the 2015 International Conference on Informative and Cybernetics for Computational Social Systems. Piscataway, NJ:IEEE, 2015:1052-1060.
[7] 于洪, 李俊华. 一种解决新项目冷启动问题的推荐算法[J]. 软件学报, 2015, 26(6):1395-1408.(YU H, LI J H. Algorithm to solve the cold-start problem in new item recommendations[J]. Journal of Software, 2015, 26(6):1395-1408.)
[8] 孙天昊,黎安能,李明,等. 基于Hadoop分布式改进聚类协同过滤推荐算法研究[J]. 计算机工程与应用,2015,51(15):124-128.(SUN T H, LI A N, LI M, et al. Study on distributed improved clustering collaborative filtering algorithm based on Hadoop[J]. Computer Engineering and Applications, 2015, 51(15):124-128.)
[9] ZHANG Y L, MA M M, WANG S P. Research of user-based collaborative filtering recommendation algorithm based on Hadoop[EB/OL].[2016-06-20]. http://www.atlantis-press.com/php/download_paper.php?id=22451.
[10] XING C, WANG H. Clustering weighted slope one for distributed parallel computing[C]//Proceedings of the 2011 International Conference on Computer Science and Network Technology. Piscataway, NJ:IEEE, 2011, 3:1595-1598.
[11] KUPISZ B, UNOLD O. Collaborative filtering recommendation algorithm based on Hadoop and Spark[C]//Proceedings of the 2015 IEEE International Conference on Industrial Technology. Piscataway, NJ:IEEE, 2015:1510-1514.
[12] JIANG T Q, LU W. Improved slope one algorithm based on time weight[J]. Applied Mechanics and Materials, 2013, 347:2365-2368.
[13] 丁少衡, 姬东鸿, 王路路. 基于用户属性和评分的协同过滤推荐算法[J]. 计算机工程与设计, 2015, 36(2):487-491.(DING S H, JI D H, WANG L L. Collaborative filtering recommendation algorithm based one user attributes and scores[J]. Computer Engineering and Design, 2015, 36(2):487-491.)
[14] ARMBRUST M, DAS T, DAVIDSON A, et al. Scaling spark in the real world:performance and usability[J]. Proceedings of the VLDB Endowment, 2015, 8(12):1840-1843.
[15] 郝立燕, 王靖. 基于填充和相似性信任因子的协同过滤推荐算法[J]. 计算机应用, 2013, 33(3):834-837.(HAO L Y, WANG J. Collaborative filtering recommendation algorithm based on filling and similarity confidence factor[J]. Journal of Computer Applications, 2013, 33(3):834-837.)

基于聚类和Spark框架的加权Slope One算法

Weighted Slope One algorithm based on clustering and Spark framework

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	陈恒恒, 倪志伟, 朱旭辉, 金媛媛, 陈千. 基于聚类分析的差分隐私高维数据发布方法[J]. 计算机应用, 2021, 41(9): 2578-2585.
[2]	祝承, 赵晓琦, 赵丽萍, 焦玉宏, 朱亚飞, 陈建英, 周伟, 谭颖. 基于谱聚类半监督特征选择的功能磁共振成像数据分类[J]. 计算机应用, 2021, 41(8): 2288-2293.
[3]	曾祥银, 郑伯川, 刘丹. 基于深度卷积神经网络和聚类的左右轨道线检测[J]. 计算机应用, 2021, 41(8): 2324-2329.
[4]	戴嫣然, 戴国庆, 袁玉波. 基于肤色学习的多人脸前景抽取方法[J]. 计算机应用, 2021, 41(6): 1659-1666.
[5]	马建红, 曹文斌, 刘元刚, 夏爽. 基于功效特征的专利聚类方法[J]. 计算机应用, 2021, 41(5): 1361-1366.
[6]	王治和, 常筱卿, 杜辉. 基于万有引力的自适应近邻传播聚类算法[J]. 计算机应用, 2021, 41(5): 1337-1342.
[7]	李国荣, 冶继民, 甄远婷. 基于新的鲁棒相似性度量的时间序列聚类[J]. 计算机应用, 2021, 41(5): 1343-1347.
[8]	李杏峰, 黄玉清, 任珍文, 李毅红. 基于自适应邻域的鲁棒多视图聚类算法[J]. 计算机应用, 2021, 41(4): 1093-1099.
[9]	龙超奇, 蒋瑜, 谢雨. 基于峰值网格改进的小波聚类算法[J]. 计算机应用, 2021, 41(4): 1122-1127.
[10]	郭佳, 韩李涛, 孙宪龙, 周丽娟. 自动确定聚类中心的比较密度峰值聚类算法[J]. 计算机应用, 2021, 41(3): 738-744.
[11]	吕佳, 鲜焱. 结合改进密度峰值聚类和共享子空间的协同训练算法[J]. 计算机应用, 2021, 41(3): 686-693.
[12]	邹志文, 秦程. 基于k-means++的动态构建空间主题R树方法[J]. 计算机应用, 2021, 41(3): 733-737.
[13]	张恩, 李会敏, 常键. 可验证的隐私保护k-means聚类方案[J]. 计算机应用, 2021, 41(2): 413-421.
[14]	袁芊芊, 邓洪敏, 王晓航. 基于超像素快速模糊C均值聚类与支持向量机的柑橘病虫害区域分割[J]. 计算机应用, 2021, 41(2): 563-570.
[15]	陈港, 孟相如, 康巧燕, 阳勇. 基于拓扑分割与聚类分析的虚拟软件定义网络映射算法[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3309-3318.