计算机应用 ›› 2017, Vol. 37 ›› Issue (5): 1382-1386.DOI: 10.11772/j.issn.1001-9081.2017.05.1382

• 人工智能 • 上一篇    下一篇

基于多源信息相似度的微博用户推荐算法

姚彬修1, 倪建成2, 于苹苹1, 李淋淋1, 曹博1   

  1. 1. 曲阜师范大学 信息科学与工程学院, 山东 日照 276826;
    2. 曲阜师范大学 软件学院, 山东 曲阜 273100
  • 收稿日期:2016-10-14 修回日期:2016-11-02 出版日期:2017-05-10 发布日期:2017-05-16
  • 通讯作者: 倪建成
  • 作者简介:姚彬修(1991-),男,山东青州人,硕士研究生,CCF会员,主要研究方向:分布式计算、数据挖掘、微博推荐;倪建成(1971-),男,山东曲阜人,教授,博士,CCF高级会员,主要研究方向:分布式计算、机器学习、数据挖掘;于苹苹(1991-),女,山东济南人,硕士研究生,主要研究方向:分布式计算、数据挖掘;李淋淋(1991-),女,山东德州人,硕士研究生,主要研究方向:并行与分布式计算、数据挖掘;曹博(1992-),女,黑龙江伊春人,硕士研究生,主要研究方向:并行与分布式计算、数据挖掘。
  • 基金资助:
    国家自然科学基金资助项目(61402258);山东省本科高校教学改革研究项目(2015M102);校级教学改革研究项目(jg05021)。

Micro blog user recommendation algorithm based on similarity of multi-source information

YAO Binxiu1, NI Jiancheng2, YU Pingping1, LI Linlin1, CAO Bo1   

  1. 1. College of Information Science and Engineering, Qufu Normal University, Rizhao Shandong 276826, China;
    2. College of Software, Qufu Normal University, Qufu Shandong 273100, China
  • Received:2016-10-14 Revised:2016-11-02 Online:2017-05-10 Published:2017-05-16
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China(61402258),the Research Project of Teaching Reform in Undergraduate Colleges and Universities in Shandong Province(2015M102), the Research Project of Teaching Reform in Universities (jg05021).

摘要: 针对传统的协同过滤(CF)推荐算法中存在的数据稀疏性和推荐准确率不高的问题,提出了基于多源信息相似度的微博用户推荐算法(MISUR)。首先,根据微博用户的标签信息运用K最近邻(KNN)算法对用户进行分类;然后,对得到的每个类中的用户分别计算其多源信息(微博内容、交互关系和社交信息)的相似度;其次,引入时间权重和丰富度权重计算多源信息的总相似度,并根据其大小进行TOP-N用户推荐;最后,在并行计算框架Spark上进行实验。实验结果表明,MISUR算法与CF算法和基于多社交行为的微博好友推荐算法(MBFR)相比,在准确率、召回率和效率方面都有较大幅度的提升,说明了MISUR算法的有效性。

关键词: 多源信息, 稀疏性, 相似度, 时间权重, 丰富度权重

Abstract: Focusing on the data sparsity and low accuracy of recommendation existed in traditional Collaborative Filtering (CF) recommendation algorithm, a micro blog User Recommendation algorithm based on the Similarity of Multi-source Information, named MISUR, was proposed. Firstly, the micro blog users were classified by K-Nearest Neighbor (KNN) algorithm according to their tag information. Secondly, the similarity of the multi-source information, such as micro blog content, interactive relationship and social information, was calculated for each user in each class. Thirdly, the time weight and the richness weight were introduced to calculate the total similarity of multi-source information, and the TOP-N recommendation was used in a descending order. Finally, the experiment was carried out on the parallel computing framework Spark. The experimental results show that, compared with CF recommendation algorithm and micro blog Friend Recommendation algorithm based on Multi-social Behavior (MBFR), the superiority of the MISUR algorithm is validated in terms of accuracy, recall and efficiency.

Key words: multi-source information, sparsity, similarity, time weight, richness weight

中图分类号: