计算机应用 ›› 2017, Vol. 37 ›› Issue (11): 3090-3094.DOI: 10.11772/j.issn.1001-9081.2017.11.3090

• 第十六届中国机器学习会议(CCML 2017) • 上一篇    下一篇

基于链接模型的主动半监督社区发现方法

柴变芳1, 王建岭2, 许冀伟1, 李文斌1   

  1. 1. 河北地质大学 信息工程学院, 石家庄 050031;
    2. 河北中医学院 公共课教学部, 石家庄 050200
  • 收稿日期:2017-05-16 修回日期:2017-06-01 出版日期:2017-11-10 发布日期:2017-11-11
  • 通讯作者: 王建岭
  • 作者简介:柴变芳(1979-),女,山西运城人,副教授,博士,主要研究方向:复杂网络分析、概率图模型、机器学习、社区发现;王建岭(1973-),男,河北巨鹿人,副教授,硕士,主要研究方向:数据挖掘;许冀伟(1979-),男,河北秦皇岛人,讲师,硕士,主要研究方向:聚类、复杂网络分析、深度学习;李文斌(1974-),男,江西南昌人,教授,博士,CCF高级会员,主要研究方向:机器学习、复杂网络。
  • 基金资助:
    国家自然科学基金资助项目(61503260)。

Active semi-supervised community detection method based on link model

CHAI Bianfang1, WANG Jianling2, XU Jiwei1, LI Wenbin1   

  1. 1. School of Information Engineering, Hebei Geo University, Shijiazhuang Hebei 050031, China;
    2. Department of Public Courses, Hebei University of Chinese Medicine, Shijiazhuang Hebei 050200, China
  • Received:2017-05-16 Revised:2017-06-01 Online:2017-11-10 Published:2017-11-11
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61503260).

摘要: 链接模型可对网络的社区发现问题建模,相比具有相同目标的对称模型和条件模型,PPL模型处理网络类型更多、社区发现准确率更高。但PPL模型是一个无监督模型,在网络社区结构不清晰时效果不佳,且不能利用易获取的先验信息。为使用尽可能少的先验,获得社区发现链接模型性能较大的提升,提出了一个主动节点先验学习(ANPL)算法,该算法主动选择效用高、易标记的成对约束进行标记,基于标记的约束对自动生成信息量更大的标记节点集合。基于PPL模型设计了一个融合网络拓扑结构和标记节点先验的半监督社区发现(SPPL)模型,并给出模型用于半监督社区发现的参数估计算法。人工网络和实际网络上的实验结果表明,利用ANPL获得的标记节点先验和网络拓扑结构,SPPL模型的社区发现准确率高于无监督PPL模型及当前流行的基于非负矩阵分解(NMF)的半监督社区发现模型。

关键词: 半监督社区发现, 主动学习, 链接模型, 最大期望算法, 约束先验

Abstract: Link model is able to model community detection problem on networks. Compared with other similar models including symmetric models and conditional models, PPL (Popularity and Productivity Link) deals more types of networks, and detects communities more accurately. But PPL is an unsupervised model, and works badly when the network structure is unclear. In addition, PPL is not able to utilize priors that are easily captained. In order to improve its performance by using as less as possible, an Active Node Prior Learning (ANPL) algorithm was provided. ANPL selected the highest utility and easily labeled pairwise constraints, and generated automatically more informative labeled nodes based on the labeled pairwise constraints. Based on the PPL model,a Semi-supervised PPL (SPPL) model was proposed for community detection, which combined the topology of network and node labels learned from the ANPL algorithm. Experiments on synthetic and real networks demonstrate that using node priors from the ANPL algorithm and the topology of a network, SPPL model excels to unsupervised PPL model and popular semi-supervised community detection models based on Non-negative Matrix Factorization (NMF).

Key words: semi-supervised community detection, active learning, link model, Expectation Maximization (EM) algorithm, constrained prior

中图分类号: