计算机应用 ›› 2020, Vol. 40 ›› Issue (12): 3578-3585.DOI: 10.11772/j.issn.1001-9081.2020060942

• 网络与通信 • 上一篇    下一篇

融合标签预处理与节点影响力的重叠社区发现算法

吴清寿1,2, 陈荣旺1, 余文森1,2, 刘耿耿3   

  1. 1. 武夷学院 数学与计算机学院, 福建 武夷山 354300;
    2. 认知计算与智能信息处理福建省高校重点实验室(武夷学院), 福建 武夷山 354300;
    3. 福州大学 数学与计算机科学学院, 福州 350116
  • 收稿日期:2020-05-31 修回日期:2020-07-29 出版日期:2020-12-10 发布日期:2020-08-11
  • 通讯作者: 余文森(1973-),男,福建南平人,教授,博士,CCF会员,主要研究方向:机器学习、图像处理。45509111@qq.com
  • 作者简介:吴清寿(1977-),男,福建莆田人,副教授,硕士,CCF会员,主要研究方向:机器学习、复杂网络分析;陈荣旺(1970-),男,福建南平人,高级工程师,硕士,主要研究方向:机器学习、复杂网络分析;刘耿耿(1988-),男,福建泉州人,副教授,博士生导师,博士,CCF会员,主要研究方向:智能计算
  • 基金资助:
    国家自然科学基金资助项目(61877010);福建省自然科学基金资助项目(2019J01835)。

Overlapping community detection algorithm fusing label preprocessing and node influence

WU Qingshou1,2, CHEN Rongwang1, YU Wensen1,2, LIU Genggeng3   

  1. 1. School of Mathematics and Computer Science, Wuyi University, Wuyishan Fujian 354300, China;
    2. Key Laboratory of Cognitive Computing and Intelligent Information Processing of Fujian Education Institutions(Wuyi University), Wuyishan Fujian 354300, China;
    3. College of Mathematics and Computer Science, Fuzhou University, Fuzhou Fujian 350116, China
  • Received:2020-05-31 Revised:2020-07-29 Online:2020-12-10 Published:2020-08-11
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61877010), the Natural Science Foundation of Fujian Province (2019J01835).

摘要: 针对节点初始标签散乱及标签传播随机性大的问题,提出一种融合标签预处理与节点影响力的重叠社区发现算法。首先,计算节点影响力,逐步选择影响力值最大的节点作为中心节点;然后,用中心节点的标签对同质的邻居节点进行标签预处理,减少了初始标签数量,降低了后续标签传播的随机性,并初步识别出了重叠节点;其次,通过标签隶属系数识别重叠节点,用节点影响力值选择非重叠节点标签,提高了算法的稳定性和准确性;最后,以最大化自适应函数增量为目标,对内聚度弱的社区进行合并,提高了社区质量。仿真实验结果表明:对于六个真实网络,所提算法在50%的数据集上具有最大的扩展模块度值;而在不同混合度、节点重叠度和节点最大归属社区数的人工基准网络上,该算法在标准化互信息(NMI)指标上都具有最好的性能。综上所述,该算法对各类网络都具有较好的适应性,且具有接近线性的时间复杂度。

关键词: 重叠社区, 中心节点, 标签传播, 节点影响力, 标签隶属系数

Abstract: Aiming at the problem of scattered initial labels and large randomness of label propagation, an overlapping community detection algorithm fusing label preprocessing and node influence was proposed. Firstly, the influence value of each node was calculated, and the node with the largest influence value was selected as the central node gradually. Secondly, the label of the central node was used to preprocess the labels of the homogeneous neighbor nodes, so as to reduce the number of initial labels as well as the randomness of subsequent label propagation, and preliminarily identify the overlapping nodes. Thirdly, the overlapping nodes were identified by the label belonging coefficient, and the labels of non-overlapping nodes were selected by the node influence values, improving the stability and accuracy of the proposed algorithm. Finally, in order to maximize the increment of the adaptive function, the communities with weak cohesion were merged together to improve the quality of communities. The simulation experimental results show that the proposed algorithm has the largest extended modularity value on 50% datasets of the six real networks, and has the best performance in Normalized Mutual Information (NMI) index on the artificial benchmark networks with different mixing degrees, overlapping degrees of node and the maximum numbers of communities to which the node belongs. In conclusion, the algorithm has good adaptability to all kinds of networks, and has nearly linear time complexity.

Key words: overlapping community, central node, label propagation, node influence, label belonging coefficient

中图分类号: