Journal of Computer Applications ›› 2017, Vol. 37 ›› Issue (11): 3085-3089.DOI: 10.11772/j.issn.1001-9081.2017.11.3085

Previous Articles     Next Articles

Semi-supervised community detection algorithm using active link selection based on iterative framework

CHEN Yiying, CHAI Bianfang, LI Wenbin, HE Yichao, WU Congcong   

  1. School of Information Engineering, Hebei GEO University, Shijiazhuang Hebei 050031, China
  • Received:2017-05-16 Revised:2017-06-05 Online:2017-11-10 Published:2017-11-11
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61503260).


陈嶷瑛, 柴变芳, 李文斌, 贺毅朝, 吴聪聪   

  1. 河北地质大学 信息工程学院, 石家庄 050031
  • 通讯作者: 陈嶷瑛
  • 作者简介:陈嶷瑛(1971-),女,湖南宁远人,教授,博士,主要研究方向:社区发现、机器学习;柴变芳(1979-),女,山西运城人,副教授,博士,主要研究方向:复杂网络分析、概率图模型、机器学习、社区发现;李文斌(1974-),男,江西南昌人,教授,博士,CCF高级会员,主要研究方向:机器学习、复杂网络分析;贺毅朝(1969-),男,河北晋州人,教授,硕士,CCF高级会员,主要研究方向:智能计算、组合优化、算法理论;吴聪聪(1975-),女,河北唐山人,讲师,硕士,主要研究方向:智能计算、信息检索。
  • 基金资助:

Abstract: In order to solve the problem that large amounts of supervised information was needed to achieve satisfactory performance, owing to the implementation of the semi-supervised community detection methods based on Non-negative Matrix Factorization (NMF) which selected prior information randomly, an Active Link Selection algorithm for semi-supervised community detection based on Graph regularization NMF (ALS_GNMF) was proposed. Firstly, in the iteration framework, the most uncertain and informative links were selected actively as prior information links. Secondly, the must-link constraints of these links, which generated the prior matrix, were added to enhance the connections in a certain community. At the same time, the cannot-link constraints were added, which modified the adjacency matrix, to weaken the connections between communities. Finally, the prior matrix was used as a graph regularization term to incorporate into the optimization objective function of NMF. And combining with network topology information, higher community discovery accuracy and robustness were achieved with less prior information. At the same prior ratio on both synthetic and real networks, experimental results demonstrate that the ALS_GNMF algorithm significantly outperformes the existing semi-supervised NMF algorithms in terms of efficiency, and it is stable especially on networks with unclear structure.

Key words: semi-supervised learning, active link selection, community detection, Non-negative Matrix Factorization (NMF)

摘要: 针对非负矩阵分解(NMF)半监督社区发现方法随机选择先验约束,导致提升相同性能需要更多约束信息的问题,提出一种基于迭代框架的主动链接选择半监督社区发现算法——ALS_GNMF。在迭代框架下,首先,主动选择不确定性高且对社区划分指导性强的链接对作为先验信息;其次,为主动选择的链接对增加must-link约束,增强社区间连接,生成先验矩阵;同时,增加cannot-link约束,减弱社区间连接,修改邻接矩阵;最后,将先验矩阵作为正则项,加入基于NMF的最优化目标函数,并融合网络拓扑结构信息,以期用较少的先验信息,达到较高的社区发现准确性和鲁棒性。实验结果表明,ALS_GNMF算法在真实网络及人工网络上,相同的先验比例下,性能比未采用迭代框架和主动策略的NMF半监督社区发现方法有更大的提升,且在结构不清晰的网络中表现稳定。

关键词: 半监督学习, 主动链接选择, 社区发现, 非负矩阵分解

CLC Number: