计算机应用 ›› 2017, Vol. 37 ›› Issue (11): 3101-3106.DOI: 10.11772/j.issn.1001-9081.2017.11.3101

• 第十六届中国机器学习会议(CCML 2017) • 上一篇    下一篇

大规模社交网络中高效的关键用户选取方法

郑永广, 岳昆, 尹子都, 张学杰   

  1. 云南大学 信息学院, 昆明 650500
  • 收稿日期:2017-05-16 修回日期:2017-06-05 出版日期:2017-11-10 发布日期:2017-11-11
  • 通讯作者: 岳昆
  • 作者简介:郑永广(1988-),男,河北邢台人,硕士研究生,主要研究方向:海量数据分析与服务;岳昆(1979-),男,云南曲靖人,教授,博士生导师,博士,CCF高级会员,主要研究方向:海量数据分析与服务;尹子都(1990-),男,甘肃兰州人,博士研究生,主要研究方向:海量数据分析与服务;张学杰(1965-),男,云南昆明人,教授,博士生导师,博士,主要研究方向:分布式计算。
  • 基金资助:
    国家自然科学基金资助项目(61472345,61562090);云南省应用基础研究计划重点项目(2014FA023);第二批"云岭学者"培养项目(C6153001);云南大学青年英才培养计划项目(WX173602);云南省教育厅科研基金资助项目(2016ZZX006,2016YJS005)。

Efficient approach for selecting key users in large-scale social networks

ZHENG Yongguang, YUE Kun, YIN Zidu, ZHANG Xuejie   

  1. School of Information Science and Engineering, Yunnan University, Kunming Yunnan 650500, China
  • Received:2017-05-16 Revised:2017-06-05 Online:2017-11-10 Published:2017-11-11
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61472345, 61562090), the Key Program of Natural Science Foundation of Yunnan Province (2014FA023), the Program for the Second Batch of Yunling Scholars of Yunnan Province (C6153001), the Program for Excellent Young Talents of Yunnan University (WX173602), the Research Foundation of Educational Department of Yunnan Province (2016ZZX006, 2016YJS005).

摘要: 针对大规模社交网络及其用户发布消息的历史数据,如何快速有效地选取具有较强信息传播能力的关键用户,提出了一种关键用户选取方法。首先,利用社交网络的结构信息,构建以用户为节点的有向图,利用用户发布消息的历史数据,基于Spark计算框架,定量计算由用户活跃度、转发交互度和信息量占比刻画的权重,从而构建社交网络的有向带权图模型;然后,借鉴PageRank算法,建立用户信息传播能力的度量机制,给出基于Spark的大规模社交网络中用户信息传播能力的计算方法;进而,给出基于Spark的d-距选取算法,通过多次迭代,使得所选取的不同关键用户的信息传播范围尽量少地重叠。建立在新浪微博数据上的实验结果表明,所提方法具有高效性、可行性和可扩展性,对于控制不良突发信息传播、社交网络舆情监控具有一定的支撑作用。

关键词: 大规模社交网络, 信息传播能力, 关键用户, PageRank, Spark

Abstract: To select key users with great information dissemination capability efficiently and effectively from large-scale social networks and corresponding historical user massages, an approach for selecting key users was proposed. Firstly, the structure information of the social network was used to construct the directed graph with the user as the node. Based on the Spark calculation framework, the weights of user activity, transmission interaction and information quantity were quantitatively calculated by the historical data of the message, so as to construct a dynamic weighted graph model of social networks. Then, the measurement for user's information dissemination capacity was established based on PageRank and the Spark-based algorithm was given correspondingly for large-scale social networks. Further more, the algorithm for d-distance selection of key users was given to make the overlap of information dissemination ranges of different key users be as less as possible by multiple iterations. The experimental results based on Sina Weibo datasets show that the proposed approach is efficient, feasible and scalable, and can provide underlying techniques to control the spread of bad news and monitor public opinions to a certain extent.

Key words: large-scale social network, information dissemination capacity, key user, PageRank, Spark

中图分类号: