Journal of Computer Applications ›› 2021, Vol. 41 ›› Issue (7): 1956-1963.DOI: 10.11772/j.issn.1001-9081.2020081193

Special Issue: 数据科学与技术

• Data science and technology • Previous Articles     Next Articles

Deep network embedding method based on community optimization

LI Yafang, LIANG Ye, FENG Weiwei, ZU Baokai, KANG Yujian   

  1. Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
  • Received:2020-08-12 Revised:2020-12-17 Online:2021-07-10 Published:2021-01-22
  • Supported by:
    This work is partially supported by Beijing Municipal Natural Science Foundation (4204085), General Science and Technology Program of Beijing Municipal Education Commission (KM202010005015), China Postdoctoral Science Foundation (2019M650407).

基于社区优化的深度网络嵌入方法

李亚芳, 梁烨, 冯韦玮, 祖宝开, 康玉健   

  1. 北京工业大学 信息学部, 北京 100124
  • 通讯作者: 李亚芳
  • 作者简介:李亚芳(1988-),女,河北沧州人,讲师,博士,CCF会员,主要研究方向:数据挖掘、复杂网络分析;梁烨(1997-),男,北京人,硕士研究生,主要研究方向:深度学习、网络数据挖掘;冯韦玮(1999-),女,北京人,主要研究方向:数据挖掘;祖宝开(1988-),女,河北辛集人,讲师,博士,主要研究方向:机器学习、数据挖掘;康玉健(1999-),男,北京人,主要研究方向:深度学习、机器学习。
  • 基金资助:
    北京市自然科学基金资助项目(4204085);北京市教委科研计划一般项目(KM202010005015);中国博士后科学基金资助项目(2019M650407)。

Abstract: With the rapid development of technologies such as modern network communication and social media, the networked big data is difficult to be applied due to the lack of efficient and available node representation. Network representation learning is widely concerned by transforming high-dimensional sparse network data into low-dimensional, compact and easy-to-apply node representation. However, the existing network embedding methods obtain the low-dimensional feature vectors of nodes and then use them as the inputs for other applications (such as node classification, community discovery, link prediction and visualization) for further analysis, without building models for specific applications, which makes it difficult to achieve satisfactory results. For the specific application of network community discovery, a deep auto-encoder clustering model that combines community structure optimization for low-dimensional feature representation of nodes was proposed, namely Community-Aware Deep Network Embedding (CADNE). Firstly, based on the deep auto-encoder model, the node low-dimensional representation was learned by maintaining the topological characteristics of the local and global links of the network, and then the low-dimensional representation of the nodes was further optimized by using the network clustering structure. In this method, the low-dimensional representations of the nodes and the indicator vectors of the communities that the nodes belong to were learnt at the same time, so that the low-dimensional representation of the nodes can not only maintain the topological characteristics of the original network structure, but also maintain the clustering characteristics of the nodes. Comparing with the existing classical network embedding methods, the results show that CADNE achieves the best clustering results on Citeseer and Cora datasets, and improves the accuracy by up to 0.525 on 20NewsGroup. In classification task, CADNE performs the best on Blogcatalog and Citeseer datasets and the performance on Blogcatalog is improved by up to 0.512 with 20% training samples. In the visualization comparison, CADNE molel can get a low-dimensional representation of nodes with clearer class boundary, which verifies that the proposed method has better low-dimensional representation ability of nodes.

Key words: large-scale complex network, community structure, deep learning, node low-dimensional representation, network embedding

摘要: 随着现代网络通信和社会媒体等技术的飞速发展,网络化的大数据由于缺少高效可用的节点表示而难以应用。将高维稀疏难于应用的网络数据转化为低维、紧凑、易于应用的节点表示的网络嵌入方法受到广泛关注。然而已有网络嵌入方法得到节点低维特征向量后,再将其作为其他应用(节点分类、社区发现、链接预测、可视化等)的输入来作进一步分析,没有针对具体应用构建模型,难以取得满意的结果。针对网络社区发现这一具体应用,提出结合社区结构优化进行节点低维特征表示的深度自编码聚类模型CADNE。首先基于深度自编码模型,通过保持网络局部及全局链接的拓扑特性来学习节点的低维表示,然后利用网络聚类结构对节点低维表示进一步优化。该方法同时学习节点的低维表示和节点所属社区的指示向量,使节点的低维表示不仅能保持原始网络结构中的拓扑结构特性,而且能保持节点的聚类特性。与已有的经典网络嵌入方法进行对比,结果显示CADNE模型在Citeseer和Cora上取得最优聚类结果,在20NewsGroup上准确率提升最高达0.525;分类性能在Blogcatalog、Citeseer数据集上取得最好结果,在Blogcatalog上训练比例20%时比基线方法提升最高达0.512;并且CADNE模型在可视化对比中能够得到类边界更加清晰的节点低维表示,验证了所提方法具有较好的节点低维表示能力。

关键词: 大规模复杂网络, 社区结构, 深度学习, 节点低维表示, 网络嵌入

CLC Number: