基于多源信息相似度的微博用户推荐算法

doi:10.11772/j.issn.1001-9081.2017.05.1382

计算机应用 ›› 2017, Vol. 37 ›› Issue (5): 1382-1386.DOI: 10.11772/j.issn.1001-9081.2017.05.1382

基于多源信息相似度的微博用户推荐算法

姚彬修¹, 倪建成², 于苹苹¹, 李淋淋¹, 曹博¹

1. 曲阜师范大学信息科学与工程学院, 山东日照 276826;
2. 曲阜师范大学软件学院, 山东曲阜 273100

收稿日期:2016-10-14 修回日期:2016-11-02 出版日期:2017-05-10 发布日期:2017-05-16
通讯作者: 倪建成
作者简介:姚彬修(1991-),男,山东青州人,硕士研究生,CCF会员,主要研究方向:分布式计算、数据挖掘、微博推荐;倪建成(1971-),男,山东曲阜人,教授,博士,CCF高级会员,主要研究方向:分布式计算、机器学习、数据挖掘;于苹苹(1991-),女,山东济南人,硕士研究生,主要研究方向:分布式计算、数据挖掘;李淋淋(1991-),女,山东德州人,硕士研究生,主要研究方向:并行与分布式计算、数据挖掘;曹博(1992-),女,黑龙江伊春人,硕士研究生,主要研究方向:并行与分布式计算、数据挖掘。
基金资助:
国家自然科学基金资助项目（61402258）；山东省本科高校教学改革研究项目（2015M102）；校级教学改革研究项目（jg05021）。

Micro blog user recommendation algorithm based on similarity of multi-source information

YAO Binxiu¹, NI Jiancheng², YU Pingping¹, LI Linlin¹, CAO Bo¹

1. College of Information Science and Engineering, Qufu Normal University, Rizhao Shandong 276826, China;
2. College of Software, Qufu Normal University, Qufu Shandong 273100, China

Received:2016-10-14 Revised:2016-11-02 Online:2017-05-10 Published:2017-05-16
Supported by:
This work is partially supported by the National Natural Science Foundation of China(61402258),the Research Project of Teaching Reform in Undergraduate Colleges and Universities in Shandong Province(2015M102), the Research Project of Teaching Reform in Universities (jg05021).

摘要/Abstract

摘要： 针对传统的协同过滤（CF）推荐算法中存在的数据稀疏性和推荐准确率不高的问题，提出了基于多源信息相似度的微博用户推荐算法（MISUR）。首先，根据微博用户的标签信息运用K最近邻（KNN）算法对用户进行分类；然后，对得到的每个类中的用户分别计算其多源信息（微博内容、交互关系和社交信息）的相似度；其次，引入时间权重和丰富度权重计算多源信息的总相似度，并根据其大小进行TOP-N用户推荐；最后，在并行计算框架Spark上进行实验。实验结果表明，MISUR算法与CF算法和基于多社交行为的微博好友推荐算法（MBFR）相比，在准确率、召回率和效率方面都有较大幅度的提升，说明了MISUR算法的有效性。

关键词: 多源信息, 稀疏性, 相似度, 时间权重, 丰富度权重

Abstract: Focusing on the data sparsity and low accuracy of recommendation existed in traditional Collaborative Filtering (CF) recommendation algorithm, a micro blog User Recommendation algorithm based on the Similarity of Multi-source Information, named MISUR, was proposed. Firstly, the micro blog users were classified by K-Nearest Neighbor (KNN) algorithm according to their tag information. Secondly, the similarity of the multi-source information, such as micro blog content, interactive relationship and social information, was calculated for each user in each class. Thirdly, the time weight and the richness weight were introduced to calculate the total similarity of multi-source information, and the TOP-N recommendation was used in a descending order. Finally, the experiment was carried out on the parallel computing framework Spark. The experimental results show that, compared with CF recommendation algorithm and micro blog Friend Recommendation algorithm based on Multi-social Behavior (MBFR), the superiority of the MISUR algorithm is validated in terms of accuracy, recall and efficiency.

Key words: multi-source information, sparsity, similarity, time weight, richness weight

中图分类号:

TP301.6

姚彬修, 倪建成, 于苹苹, 李淋淋, 曹博. 基于多源信息相似度的微博用户推荐算法[J]. 计算机应用, 2017, 37(5): 1382-1386.

YAO Binxiu, NI Jiancheng, YU Pingping, LI Linlin, CAO Bo. Micro blog user recommendation algorithm based on similarity of multi-source information[J]. Journal of Computer Applications, 2017, 37(5): 1382-1386.

参考文献

[1] HU Y, PENG Q, HU X. A time-aware and data sparsity tolerant approach for Web service recommendation[C]//Proceedings of the 2014 IEEE International Conference on Web Services. Washington, DC:IEEE Computer Society, 2014:33-40.
[2] XU Y, ZHOU M, HAN S. Feature representation for microblog followee recommendation in classification framework[C]//Proceedings of the 7th International Conference on Advanced Computational Intelligence. Piscataway, NJ:IEEE, 2015:318-322.
[3] YU C, HUANG L. CluCF:a clustering CF algorithm to address data sparsity problem[M]//Service Oriented Computing & Applications. Berlin:Springer, 2016:191-199.
[4] XIE F, CHEN Z, SHANG J, et al. Grey forecast model for accurate recommendation in presence of data sparsity and correlation[J]. Knowledge-Based Systems, 2014, 69:179-190.
[5] HAN S, YAN X. Friend recommendation of microblog in classification framework:using multiple social behavior features[C]//Proceedings of the 2014 International Conference on Behavior, Economic and Social Computing. Piscataway, NJ:IEEE, 2014:1-6.
[6] SHANG Y, ZHANG P, CAO Y. A new interest-sensitive and network-sensitive method for user recommendation[C]//Proceedings of the 8th IEEE International Conference on Networking, Architecture and Storage. Piscataway, NJ:IEEE, 2013:242-246.
[7] 徐志明, 李栋, 刘挺,等. 微博用户的相似性度量及其应用[J]. 计算机学报, 2014, 37(1):207-218.(XU Z M, LI D, LIU T, et al. Similarity measurement and its application to the users of micro-blog[J]. Chinese Journal of Computers, 2014, 37(1):207-218.)
[8] TANG F, ZHANG B, ZHENG J, et al. Friend recommendation based on the similarity of micro-blog user model[C]//Proceedings of the 2013 IEEE International Conference on Green Computing and Communications. Piscataway, NJ:IEEE, 2013:2200-2204.
[9] YAN Z, ZHOU J. User recommendation with tensor factorization in social networks[C]//Proceedings of the 2012 IEEE International Conference on Acoustics. Piscataway, NJ:IEEE, 2012:3853-3856.
[10] COVER T, HART P. Nearest neighbor pattern classification[J].IEEE Transactions on Information Theory, 1967, 13(1):21-27.
[11] MIHALCEA R, TARAU P. TextRank:bringing order into texts[EB/OL].[2016-06-20]. https://www.mendeley.com/catalog/textrank-bringing-order-texts/.
[12] WINLAW M, HYNES M B, CATERINI A, et al. Algorithmic acceleration of parallel ALS for collaborative filtering:speeding up distributed big data recommendation in Spark[C]//Proceedings of the 2015 IEEE 21st International Conference on Parallel and Distributed Systems. Piscataway, NJ:IEEE, 2015:682-691.

[1]	张成, 万源, 强浩鹏. 基于知识蒸馏的深度无监督离散跨模态哈希[J]. 计算机应用, 2021, 41(9): 2523-2531.
[2]	陈俊, 何庆. 基于余弦相似度的改进蝴蝶优化算法[J]. 计算机应用, 2021, 41(9): 2668-2677.
[3]	吴悦, 雒江涛, 刘锐, 胡钟尹. 基于感知哈希和切块的视频相似度检测方法[J]. 计算机应用, 2021, 41(7): 2070-2075.
[4]	张豪, 朱睿, 宋栿尧, 方鹏, 夏秀峰. 距离-关键字相似度约束的双色反k近邻查询方法[J]. 计算机应用, 2021, 41(6): 1686-1693.
[5]	朱怡, 宁振虎, 周艺华. 基于视觉特征的仿冒域名轻量级检测技术[J]. 计算机应用, 2020, 40(8): 2279-2285.
[6]	赵小虎, 赵成龙. 基于多特征语义匹配的知识库问答系统[J]. 计算机应用, 2020, 40(7): 1873-1878.
[7]	韦伟, 李小娟. 基于相似论文增广的深度学习专利质量评估[J]. 计算机应用, 2020, 40(4): 966-971.
[8]	丁辉, 李丽宏, 原钢. 融合GMS与VCS+GC-RANSAC的图像配准算法[J]. 计算机应用, 2020, 40(4): 1138-1143.
[9]	吕一可, 徐凯, 黄振强. 基于面积划分的轨迹相似性度量方法[J]. 计算机应用, 2020, 40(2): 578-583.
[10]	吕亚丽, 苗钧重, 胡玮昕. 基于标签进行度量学习的图半监督学习算法[J]. 计算机应用, 2020, 40(12): 3430-3436.
[11]	赵昕晨, 杨楠. 基于头部姿态分析的摄像头视线追踪系统优化[J]. 计算机应用, 2020, 40(11): 3295-3299.
[12]	花超, 王庚润, 陈雷. 基于低通滤波模型的行人再识别算法[J]. 计算机应用, 2020, 40(11): 3314-3319.
[13]	沈学利, 李子健, 赫辰皓. 基于评分填充与信任信息的混合推荐算法[J]. 计算机应用, 2020, 40(10): 2789-2794.
[14]	李功丽, 李钰, 张恩, 尹天宇. 面向用户隐私保护的高效基因比对方案[J]. 计算机应用, 2020, 40(1): 136-142.
[15]	谢丽霞, 魏瑞炘. 物联网节点动态信任度评估方法[J]. 计算机应用, 2019, 39(9): 2597-2603.

基于多源信息相似度的微博用户推荐算法

Micro blog user recommendation algorithm based on similarity of multi-source information

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics