计算机应用 ›› 2015, Vol. 35 ›› Issue (9): 2565-2568.DOI: 10.11772/j.issn.1001-9081.2015.09.2565

• 数据技术 • 上一篇    下一篇

面向大规模学术社交网络的社区发现模型

李春英1,2, 汤庸2, 汤志康3, 黄泳航2, 袁成哲2, 赵剑冬1   

  1. 1. 广东技术师范学院 计算机网络中心, 广州 510665;
    2. 华南师范大学 计算机学院, 广州 510631;
    3. 广东技术师范学院 计算机科学学院, 广州 510665
  • 收稿日期:2015-04-20 修回日期:2015-06-18 出版日期:2015-09-10 发布日期:2015-09-17
  • 通讯作者: 汤志康(1978-),男,山东临沂人,讲师,硕士,主要研究方向:社交网络与大数据应用、社区发现,fzutang@126.com
  • 作者简介:李春英(1978-),女,黑龙江齐齐哈尔人,副教授,博士研究生,CCF会员,主要研究方向:社交网络与大数据应用、服务计算、社区发现;汤庸(1964-),男,湖南张家界人,教授,博士生导师,博士,CCF会员,主要研究方向:信息搜索与数据挖掘、协同计算与移动互联网应用。
  • 基金资助:
    国家863计划项目(2013AA01A212);国家自然科学基金资助项目(61272067,61370229);广东省自然基金团队研究项目(S2012030006242);广东省自然科学基金-博士科研启动项目(2014A030310238);广东省教育厅特色创新项目(2014WTSCX078);广东技术师范学院校级项目(2014)。

Community detection model in large scale academic social networks

LI Chunying1,2, TANG Yong2, TANG Zhikang3, HUANG Yonghang2, YUAN Chengzhe2, ZHAO Jiandong1   

  1. 1. Computer Network Center, Guangdong Polytechnic Normal University, Guangzhou Guangdong 510665, China;
    2. School of Computer, South China Normal University, Guangzhou Guangdong 510631, China;
    3. School of Computer Science, Guangdong Polytechnic Normal University, Guangzhou Guangdong 510665, China
  • Received:2015-04-20 Revised:2015-06-18 Online:2015-09-10 Published:2015-09-17

摘要: 针对基于标签传播的复杂网络重叠社区发现算法中预先输入参数在真实网络中的局限性以及标签冗余等问题,提出一种基于标签传播的面向大规模学术社交网络的社区发现模型。该模型通过寻找网络中互不相交的最大极大团(UMC)并对每个UMC中的节点赋予唯一标签来减少冗余标签,提高社区发现的效率以及稳定性。标签更新时以UMC作为核心单位采用亲密度的方式由中心向四周更新UMC邻接节点的标签及权重,以权重最大值的方式更新网络中非UMC邻接节点的权重。后期处理阶段采用自适应阈值方式去除节点标签中的噪声,有效克服了预先输入重叠社区个数在真实网络中的局限性。通过在学术社交网络平台——学者网数据集上的实验表明,该模型能够将具有一定共性的节点划分到同一个社区中,并为学术社交网络平台进一步的好友推荐、论文分享等精确的个性化服务提供了支持。

关键词: 社交网络, 社区发现, 重叠社区, 标签传播, 最大极大团, 自适应阈值

Abstract: Concerning the problem that community detection algorithm based on label propagation in complex networks has a pre-parameter limit in the real network and redundant labels, a community detection model in large scale academic social networks was proposed. The model detected Utmost Maximal Cliques (UMC) in the academic social network and arbitrary intersection between the UMC is the empty set, and then let nodes of each UMC share the unique label by reducing redundant labels and random factors, so the model increased the efficiency and stability of the algorithm. Meanwhile the model completed label propagation of the UMC adjacent nodes using closeness from core node groups (UMC) to spread around, Non-UMC adjacent nodes in the network were updated according to the maximum weight of its neighbor nodes. In the post-processing stage an adaptive threshold method removed useless labels, thereby effectively overcame the pre-parameter limitations in the real complex network. The experimental results on academic social networking platform-SCHOLAT data set prove that the model has an ability to assign nodes with certain generality to the same community, and it provides support of the academic social networks precise personalized service in the future, such as latent friend recommendation and paper sharing.

Key words: social network, community detection, overlapping community, label propagation, Utmost Maximal Clique (UMC), adaptive threshold

中图分类号: