大规模社交网络中高效的关键用户选取方法

doi:10.11772/j.issn.1001-9081.2017.11.3101

计算机应用 ›› 2017, Vol. 37 ›› Issue (11): 3101-3106.DOI: 10.11772/j.issn.1001-9081.2017.11.3101

• 第十六届中国机器学习会议(CCML 2017) • 上一篇下一篇

大规模社交网络中高效的关键用户选取方法

郑永广, 岳昆, 尹子都, 张学杰

云南大学信息学院, 昆明 650500

收稿日期:2017-05-16 修回日期:2017-06-05 发布日期:2017-11-11 出版日期:2017-11-10
通讯作者: 岳昆
作者简介:郑永广(1988-),男,河北邢台人,硕士研究生,主要研究方向:海量数据分析与服务;岳昆(1979-),男,云南曲靖人,教授,博士生导师,博士,CCF高级会员,主要研究方向:海量数据分析与服务;尹子都(1990-),男,甘肃兰州人,博士研究生,主要研究方向:海量数据分析与服务;张学杰(1965-),男,云南昆明人,教授,博士生导师,博士,主要研究方向:分布式计算。
基金资助:
国家自然科学基金资助项目（61472345，61562090）；云南省应用基础研究计划重点项目（2014FA023）；第二批"云岭学者"培养项目（C6153001）；云南大学青年英才培养计划项目（WX173602）；云南省教育厅科研基金资助项目（2016ZZX006，2016YJS005）。

Efficient approach for selecting key users in large-scale social networks

ZHENG Yongguang, YUE Kun, YIN Zidu, ZHANG Xuejie

School of Information Science and Engineering, Yunnan University, Kunming Yunnan 650500, China

Received:2017-05-16 Revised:2017-06-05 Online:2017-11-11 Published:2017-11-10
Supported by:
This work is partially supported by the National Natural Science Foundation of China (61472345, 61562090), the Key Program of Natural Science Foundation of Yunnan Province (2014FA023), the Program for the Second Batch of Yunling Scholars of Yunnan Province (C6153001), the Program for Excellent Young Talents of Yunnan University (WX173602), the Research Foundation of Educational Department of Yunnan Province (2016ZZX006, 2016YJS005).

摘要/Abstract

摘要： 针对大规模社交网络及其用户发布消息的历史数据，如何快速有效地选取具有较强信息传播能力的关键用户，提出了一种关键用户选取方法。首先，利用社交网络的结构信息，构建以用户为节点的有向图，利用用户发布消息的历史数据，基于Spark计算框架，定量计算由用户活跃度、转发交互度和信息量占比刻画的权重，从而构建社交网络的有向带权图模型；然后，借鉴PageRank算法，建立用户信息传播能力的度量机制，给出基于Spark的大规模社交网络中用户信息传播能力的计算方法；进而，给出基于Spark的d-距选取算法，通过多次迭代，使得所选取的不同关键用户的信息传播范围尽量少地重叠。建立在新浪微博数据上的实验结果表明，所提方法具有高效性、可行性和可扩展性，对于控制不良突发信息传播、社交网络舆情监控具有一定的支撑作用。

关键词: 大规模社交网络, 信息传播能力, 关键用户, PageRank, Spark

Abstract: To select key users with great information dissemination capability efficiently and effectively from large-scale social networks and corresponding historical user massages, an approach for selecting key users was proposed. Firstly, the structure information of the social network was used to construct the directed graph with the user as the node. Based on the Spark calculation framework, the weights of user activity, transmission interaction and information quantity were quantitatively calculated by the historical data of the message, so as to construct a dynamic weighted graph model of social networks. Then, the measurement for user's information dissemination capacity was established based on PageRank and the Spark-based algorithm was given correspondingly for large-scale social networks. Further more, the algorithm for d-distance selection of key users was given to make the overlap of information dissemination ranges of different key users be as less as possible by multiple iterations. The experimental results based on Sina Weibo datasets show that the proposed approach is efficient, feasible and scalable, and can provide underlying techniques to control the spread of bad news and monitor public opinions to a certain extent.

Key words: large-scale social network, information dissemination capacity, key user, PageRank, Spark

中图分类号:

TP391.41

郑永广, 岳昆, 尹子都, 张学杰. 大规模社交网络中高效的关键用户选取方法[J]. 计算机应用, 2017, 37(11): 3101-3106.

ZHENG Yongguang, YUE Kun, YIN Zidu, ZHANG Xuejie. Efficient approach for selecting key users in large-scale social networks[J]. Journal of Computer Applications, 2017, 37(11): 3101-3106.

参考文献

[1] ZHAO Z, RESNICK P, MEI Q Z. Enquiring minds:early detection of rumors in social media from enquiry posts[C]//Proceedings of the 24th International Conference on World Wide Web. New York:ACM, 2015:1395-1405.
[2] 周东浩,韩文报.DiffRank:一种新型社会网络信息传播检测算法[J]. 计算机学报,2014, 37(4):884-893. (ZHOU D H, HAN W B. DiffRank:a novel algorithm for information diffusion detection in social networks[J]. Chinese Journal of Computers, 2014, 37(4):884-893.)
[3] 曹玖新,吴江林,石伟,等.新浪微博网信息传播分析与预测[J]. 计算机学报,2014,37(4):779-790. (CAO J X, WU J L, SHI W, et al. Sina microblog information diffusion analysis and prediction[J]. Chinese Journal of Computers, 2014, 37(4):779-790.)
[4] BIAN J W, YANG Y, CHUA T S. Predicting trending messages and diffusion participants in microblogging network[C]//Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York:ACM, 2014:537-546.
[5] 徐恪,张赛,陈昊,等.在线社会网络的测量与分析[J]. 计算机学报,2014,37(1):165-188. (XU K, ZHANG S, CHEN H, et al. Measurement and analysis of online social networks[J]. Chinese Journal of Computers, 2014,37(1):165-188.)
[6] 韩毅,许进,方滨兴,等.社交网络的结构支撑理论[J]. 计算机学报,2014,37(4):905-914. (HAN Y, XU J, FANG B X, et al. Structural supportiveness theory on social networks[J]. Chinese Journal of Computers, 2014,37(4):905-914.)
[7] GUILLE A, HACID H, FAVRE C, et al. Information diffusion in online social networks:a survey[J]. ACM SIGMOD Record, 2013,42(2):17-28.
[8] MAHMOODY A, RIONDATO M, UPFAL E. Wiggins:detecting valuable information in dynamic networks using limited resources[C]//Proceedings of the 9th ACM International Conference on Web Search and Data Mining. New York:ACM, 2016:677-686.
[9] KEMPE D, KLEINBERG J, TARDOS E. Maximizing the spread of influence through a social network[C]//Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM, 2003:137-146.
[10] LESKOVEC J, KRAUSE A, GUESTRIN C, et al. Cost-effective outbreak detection in networks[C]//Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM, 2007:420-429.
[11] CHEN W, WANG Y J, YANG S Y. Efficient influence maximization in social networks[C]//Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM, 2009:199-208.
[12] 王晨旭,管晓宏,秦涛,等.微博消息传播中意见领袖影响力建模与应用研究[J]. 软件学报,2015,26(6):1473-1485.(WANG C X, GUAN X H, QIN T, et al. Modeling on opinion leader's influence in microblog message propagation and its application[J]. Journal of Software, 2015, 26(6):1473-1485.)
[13] 曹玖新,陈高君,吴江林,等.基于多维特征分析的社交网络意见领袖挖掘[J]. 电子学报,2016,44(4):898-905. (CAO J X, CHEN G J, WU J L, et al. Multi-feature based opinion leader mining in social networks[J]. Acta Electronica Sinica, 2016, 44(4):898-905.)
[14] YUE K, WU H, FU X D, et al. A data-intensive approach for discovering user similarities in social behavioral interactions based on the Bayesian network[J]. Neurocomputing, 2017,219:364-375.
[15] 于岩,陈鸿昶,于洪涛.基于霍克斯过程的社交网络用户关系强度模型[J]. 电子学报,2016,44(6):1362-1368.(YU Y, CHEN H C, YU H T. A social networks user relationship strength model based on Hawkes process[J]. Acta Electronica Sinica, 2016, 44(6):1362-1368.)
[16] ZAHARIA M. An architecture for fast and general data processing on large clusters[EB/OL].[2016-11-20]. http://digitalassets.lib.berkeley.edu/etd/ucb/text/Zaharia_berkeley_0028E_14121.pdf.
[17] XIN R S, GONZALEZ J E, FRANKLIN M J, et al. GraphX:a resilient distributed graph system on Spark[C]//Proceedings of the 1st International Workshop on Graph Data Management Experience and Systems. New York:ACM, 2013:Article No. 2.
[18] XIE W L, BINDEL D, DEMERS A, et al. Edge-weighted personalized PageRank:breaking a decade-old performance barrier[C]//Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM, 2015:1325-1334.
[19] 吴信东,李毅,李磊.在线社交网络影响力分析[J].计算机学报, 2014,37(4):735-752. (WU X D, LI Y, LI L. Influence analysis of online social networks[J]. Chinese Journal of Computers, 2014,37(4):735-752.)

大规模社交网络中高效的关键用户选取方法

Efficient approach for selecting key users in large-scale social networks

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	李旭, 何玉林, 崔来中, 黄哲学, PHILIPPE Fournier‑Viger. 基于大数据随机样本划分的分布式观测点分类器[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1727-1733.
[2]	吴仁彪, 张振驰, 贾云飞, 乔晗. 云平台下基于截止时间的自适应调度策略[J]. 《计算机应用》唯一官方网站, 2023, 43(1): 176-184.
[3]	冯钧, 王秉发, 陆佳民. 分布式资源描述框架数据管理系统查询性能评价[J]. 《计算机应用》唯一官方网站, 2022, 42(2): 440-448.
[4]	李翀, 王宇宸, 杜伟静, 何晓涛, 刘学敏, 张士波, 李树仁. 基于Web of Science的PageRank人才挖掘算法[J]. 计算机应用, 2021, 41(5): 1356-1360.
[5]	陈晓楠, 胡建敏, 陈茜, 张威. 基于LightGBM算法的网络战仿真与效能评估[J]. 计算机应用, 2020, 40(7): 2003-2008.
[6]	刘斌, 何进荣, 李远成, 韩宏. 基于分布式神经网络的苹果价格预测方法[J]. 《计算机应用》唯一官方网站, 2020, 40(2): 369-374.
[7]	顾军华, 王锋, 戚永军, 孙哲然, 田泽培, 张亚娟. 基于多尺度卷积特征融合的肺结节图像检索方法[J]. 《计算机应用》唯一官方网站, 2020, 40(2): 561-565.
[8]	章夏杰, 朱敬华, 陈杨. Spark下的分布式粗糙集属性约简算法[J]. 《计算机应用》唯一官方网站, 2020, 40(2): 518-523.
[9]	张宪立, 唐建新, 曹来成. 基于反向PageRank的影响力最大化算法[J]. 计算机应用, 2020, 40(1): 96-102.
[10]	程文亮, 王志宏, 周虞, 过弋, 赵俊锋. 面向外汇市场监测的分布式计算框架设计[J]. 计算机应用, 2020, 40(1): 173-180.
[11]	崔艺馨, 陈晓东. Spark框架优化的大规模谱聚类并行算法[J]. 计算机应用, 2020, 40(1): 168-172.
[12]	刘靖, 肖冠烽. 基于Spark与粒子滤波算法的公交到站时间预测系统[J]. 计算机应用, 2019, 39(2): 429-435.
[13]	刘子豪, 李凌, 叶枫. 基于SparkR的水文传感器数据的异常检测方法[J]. 计算机应用, 2019, 39(2): 436-440.
[14]	李龙洋, 董一鸿, 施炜杰, 潘剑飞. SQM:基于Spark的大规模单图上的子图匹配算法[J]. 计算机应用, 2019, 39(1): 46-50.
[15]	赵文芳, 王京丽, 尚敏, 刘亚楠. 基于粒子群优化和支持向量机的花粉浓度预测模型[J]. 计算机应用, 2019, 39(1): 98-104.