计算机应用 ›› 2015, Vol. 35 ›› Issue (10): 2781-2783.DOI: 10.11772/j.issn.1001-9081.2015.10.2781

• 第十五届中国机器学习会议(CCML2015)论文 • 上一篇    下一篇

基于Spark的矩阵分解推荐算法

郑凤飞, 黄文培, 贾明正   

  1. 西南交通大学 信息科学与技术学院, 成都 611756
  • 收稿日期:2015-06-01 修回日期:2015-07-06 出版日期:2015-10-10 发布日期:2015-10-14
  • 通讯作者: 郑凤飞(1990-),男,甘肃会宁人,硕士研究,主要研究方向:云计算,1207264887@qq.com
  • 作者简介:黄文培(1967-),男,陕西西安人,副教授,博士,主要研究方向:信息安全、网络安全;贾明正(1991-),吉林松原人,男,硕士研究生,主要研究方向:数据挖掘、云计算。

Matrix factorization recommendation algorithm based on Spark

ZHENG Fengfei, HUANG Wenpei, JIA Mingzheng   

  1. School of Information Science and Technology, Southwest Jiaotong University, Chengdu Sichuan 611756, China
  • Received:2015-06-01 Revised:2015-07-06 Online:2015-10-10 Published:2015-10-14

摘要: 针对传统矩阵分解算法在处理海量数据信息时所面临的处理速度和计算资源的瓶颈问题,利用Spark在内存计算和迭代计算上的优势,提出了Spark框架下的矩阵分解并行化算法。首先,依据历史数据矩阵初始化用户因子矩阵和项目因子矩阵;其次,迭代更新因子矩阵,将迭代结果置于内存中作为下次迭代的输入;最后,迭代结束时得到矩阵推荐模型。通过在GroupLens网站上提供的MovieLens数据集上的实验结果表明,加速比(Speedup)值达到了线性的结果,该算法可以提高协同过滤推荐算法在大数据规模下的执行效率。

关键词: 协同过滤, 推荐算法, 矩阵分解, 迭代最小二乘法, Spark

Abstract: In order to solve the bottleneck problems of processing speed and resource allocation, a Spark based matrix factorization recommendation algorithm was proposed. Firstly, user factor matrix and item factor matrix were initialized according to historical data. Secondly, factor matrix was iteratively updated and the result was stored in memory as the input of next iteration. Finally, recommendation model was generated when iteration ended. The experiment on MovieLens shows that the speedup is linear and the proposed Spark based algorithm can save time and significantly improve the execution efficiency of collaborative filtering recommendation algorithm.

Key words: collaborative filtering, recommendation algorithm, matrix factorization, Alternating Least Square (ALS), Spark

中图分类号: