• •    

一种基于聚类和Spark框架的加权Slope One算法

李淋淋,倪建成,于苹苹,姚彬修,曹博   

  1. 曲阜师范大学
  • 收稿日期:2016-09-30 修回日期:2016-12-06 发布日期:2016-12-06
  • 通讯作者: 李淋淋

A Weighted Slope One Algorithm based on Clustering and Spark Framework

  • Received:2016-09-30 Revised:2016-12-06 Online:2016-12-06
  • Contact: LI LinLin

摘要: 摘 要: 针对传统Slope One算法在相似性计算时未考虑项目属性信息和时间因素对项目相似性计算的影响,以及推荐在当前大数据背景下面临的计算复杂度高、处理速度慢的问题,提出了一种基于聚类和Spark框架的加权Slope One算法。首先将时间权重加入到传统的项目评分相似性计算中,并引入项目属性相似性生成项目综合相似度;然后结合Canopy-K-means聚类算法生成最近邻居集;最后利用Spark计算框架对数据进行分区迭代计算,实现该算法的并行化。实验结果表明,基于Spark框架的改进算法与传统Slope One算法、基于用户相似性的加权Slope One算法相比,评分预测准确性更高,较Hadoop平台下的运行效率平均可提高3.5-5倍,更适合应用于大规模数据集的推荐。

关键词: 关键词: Slope One, 聚类, Spark, 时间权重, 项目属性

Abstract: Abstract: Focused on the problem that without putting the item attribute and time factor into account the similarity between items in traditional Slope One algorithm, as well as the recommended bottleneck that slow processing and high complexity under the big data background, a weighted Slope One algorithm based on clustering and Spark framework is put forward. Firstly, the rating similarity was included into the time weight and comprehensive similarity was computed by introducing item attribute. And then the set of nearest neighbor was generated through combining with the Canopy-K-means algorithm; Finally, the data is partitioned and iterated to realize parallelization by using of Spark framework. Experimental results show that the improved algorithm based on Spark framework can significantly increase in accuracy of predicted rating compared with traditional Slope One algorithm and Slope One based on user similarity, averagely improve 3.5-5 times in running efficiency compared with Hadoop platform, and is more suitable for recommendation of the large data.

Key words: Keywords: Slope One, clustering, spark, time weight, item attribute