Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (11): 3396-3402.DOI: 10.11772/j.issn.1001-9081.2023101536

• Data science and technology • Previous Articles     Next Articles

Non-overlapping community detection with imbalanced community sizes

Shiliang LIU, Yi WANG, Yinglong MA()   

  1. School of Control and Computer Engineering,North China Electric Power University,Beijing 102206,China
  • Received:2023-11-10 Revised:2024-01-11 Accepted:2024-01-19 Online:2024-01-01 Published:2024-11-10
  • Contact: Yinglong MA
  • About author:LIU Shiliang, born in 1998, M. S. candidate. His research interests include community detection.
    WANG Yi, born in 2000, M. S. candidate. His research interests include social recommendation.
  • Supported by:
    National Natural Science Foundation of China(62072450);Science and Technology Project of State Grid(SGGSXT00XMJS2250023)

考虑社区规模不平衡的非重叠社区检测

刘世梁, 王义, 马应龙()   

  1. 华北电力大学 控制与计算机工程学院,北京 102206
  • 通讯作者: 马应龙
  • 作者简介:刘世梁(1998—),男,甘肃武威人,硕士研究生,主要研究方向:社区检测
    王义(2000—),男,河北沧州人,硕士研究生,主要研究方向:社交推荐
  • 基金资助:
    国家自然科学基金资助项目(62072450);国家电网公司科技项目(SGGSXT00XMJS2250023)

Abstract:

Community detection helps to comprehend the complex structure of social networks, but most of the existing community detection methods do not consider the imbalanced sizes of communities to detect, and the discovered community structures are relatively single with low accuracy. Therefore, a non-overlapping community detection method based on Local Expansion of Initial Community Structure (LEICS) was proposed. LEICS was divided into three stages: in the first stage, the initial community structures with different scales were detected by utilizing the hierarchical structure information and local structure information of the network; in the second stage, the initial community was expanded by calculating the connection intensity between the node and the nodes in the community and the modularity contribution of the node, and then using the Label Propagation Algorithm (LPA) to deal with the rest of the nodes; in the third stage, for unstable communities with size smaller than the average community size, the nodes were redistributed to further optimize the results of community detection. Experimental results on twelve datasets of real-world networks and Lancichinetti-Fortunato-Radicchi (LFR) simulated networks show that compared to the suboptimal Local Balanced Label Diffusion (LBLD) algorithm, LEICS improves the Normalized Mutual Information (NMI) by at least 5 percentage points on Polbooks and YouTube networks, and the accuracy and robustness of LEICS in both small-size and large-size networks are fully validated, proving that LEICS can adapt to the imbalance of community size.

Key words: social network, community detection, initial community structure, expansion strategy, label propagation

摘要:

社区检测有助于理解社交网络错综复杂的结构,但现有的大部分社区检测方法并未考虑社交网络中社区规模的不平衡,发现的社区结构较为单一且准确率较低。为此,提出基于初始社区结构局部扩展的社区检测方法(LEICS)。LEICS分为3个阶段:第一阶段,充分利用网络的层次结构信息和局部结构信息发现不同规模的初始社区结构;第二阶段,通过计算节点与社区内节点的连接强度和节点对社区的模块度贡献扩展初始社区,再利用标签传播算法(LPA)处理剩余节点;第三阶段,重新分配小于平均社区大小的不稳定社区中的节点,以进一步优化社区检测的结果。在12个真实世界网络和兰奇基内蒂-福图纳托-拉迪奇(LFR)仿真网络上的实验结果表明,相较于次优的局部平衡标签扩散(LBLD)算法,LEICS在Polbooks和YouTube网络上的归一化互信息(NMI)提高了至少5个百分点,它的准确性和鲁棒性在小规模和大规模网络中得到充分验证,这表明LEICS可以适应社区规模的不平衡。

关键词: 社交网络, 社区检测, 初始社区结构, 扩展策略, 标签传播

CLC Number: