Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (12): 3423-3429.DOI: 10.11772/j.issn.1001-9081.2020061048

• 2020 China Conference on Granular Computing and Knowledge Discovery(CGCKD 2020) • Previous Articles     Next Articles

Improved label propagation algorithm based on random walk

ZHENG Wenping1,2, YUE Xiangdou1,2, YANG Gui1,2   

  1. 1. School of Computer and Information Technology, Shanxi University, Taiyuan Shanxi 030006, China;
    2. Key Laboratory of Computational Intelligence and Chinese Information Processing, Ministry of Education(Shanxi University), Taiyuan Shanxi 030006, China
  • Received:2020-06-12 Revised:2020-09-17 Online:2020-12-10 Published:2020-11-10
  • Supported by:
    This work is partially supported by the Natural Science Foundation of Shanxi Province (201801D121123), the Research Project of the Shanxi Scholarship Council of China (2017-014).

基于随机游走的改进标签传播算法

郑文萍1,2, 岳香豆1,2, 杨贵1,2   

  1. 1. 山西大学 计算机与信息技术学院, 太原 030006;
    2. 计算智能与中文信息处理教育部重点实验室(山西大学), 太原 030006
  • 通讯作者: 郑文萍(1979-),女,山西晋中人,副教授,博士,CCF会员,主要研究方向:图论算法、生物信息学。wpzheng@sxu.edu.cn
  • 作者简介:岳香豆(1993-),女,山西运城人,硕士研究生,主要研究方向:复杂网络社区发现;杨贵(1975-),男,山西大同人,博士研究生,主要研究方向:生物信息学
  • 基金资助:
    山西省自然科学基金资助项目(201801D121123);山西省回国留学人员科研基金资助项目(2017-014)。

Abstract: Community detection is a useful tool for mining hidden information in social networks. And Label Propagation Algorithm (LPA) is a common algorithm in the community detection algorithm, which does not require any prior knowledge and runs fast. Aiming at the problem of the instability of community detection algorithm results caused by the strong randomness of label propagation algorithm, an improved Label Propagation Algorithm based on Random Walk (LPARW) was proposed. Firstly, according to the random walk on the network, the importance order of nodes was determined, so as to obtain the update order of nodes. Secondly, the update sequence of nodes was traversed, and the similarity calculation between each node and the node before it was performed. If the node and the node before it were neighbor nodes and the similarity between them was greater than the threshold, then the node before it was selected as the seed node. Finally, the label of the seed node was propagated to the rest of the nodes in order to obtain the final division result of the communities. The proposed algorithm was comparatively analyzed with some classic label propagation algorithms on 4 labeled networks and 5 unlabeled real networks. Experimental results show that the proposed algorithm is better than other comparison algorithms on classic evaluation indicators such as Normalized Mutual Information (NMI), Adjusted Rand Index (ARI) and modularity. It can be seen that the proposed algorithm has the good community division effect.

Key words: complex network, community detection, Label Propagation Algorithm (LPA), random walk, seed expansion strategy

摘要: 社区发现是挖掘社交网络隐藏信息的一个有用的工具,而标签传播算法(LPA)是社区发现算法中的一种常见算法,不需要任何的先验知识,且运行速度快。针对标签传播算法有很强的随机性而导致的社区发现算法结果不稳定的问题,提出了一种基于随机游走的改进标签传播算法(LPARW)。首先,根据在网络上进行随机游走确定了节点重要性的排序,从而得到节点的更新顺序;然后,遍历节点的更新序列,对每个节点将其与排序在其之前的节点进行相似性计算,若该节点与排序在其之前的节点是邻居节点且它们之间的相似性大于阈值,则将排序在其之前的节点选为种子节点;最后,将种子节点的标签传播给其余的节点,得到社区的最终划分结果。将所提算法与一些经典的标签传播算法在4个有标签的网络和5个无标签的真实网络上进行比较分析,实验结果表明所提算法在标准互信息(NMI)、调整兰德系数(ARI)和模块度等经典的评价指标上的性能均优于其余对比算法,可见该算法具有很好的社区划分效果。

关键词: 复杂网络, 社区发现, 标签传播算法, 随机游走, 种子扩展策略

CLC Number: