Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (9): 2643-2651.DOI: 10.11772/j.issn.1001-9081.2021071354

• Artificial intelligence •     Next Articles

Semi-supervised representation learning method combining graph auto-encoder and clustering

Hangyuan DU1, Sicong HAO1, Wenjian WANG1,2()   

  1. 1.School of Computer and Information Technology,Shanxi University,Taiyuan Shanxi 030006,China
    2.Key Laboratory Computational Intelligence and Chinese Information Processing of Ministry of Education (Shanxi University),Taiyuan Shanxi 030006,China
  • Received:2021-07-28 Revised:2021-10-18 Accepted:2021-10-21 Online:2021-11-10 Published:2022-09-10
  • Contact: Wenjian WANG
  • About author:DU Hangyuan, born in 1985, Ph. D., associate professor. His research interests include cluster analysis, complex network.
    HAO Sicong, born in 1995, M. S. candidate. Her research interests include machine learning, network data mining.
  • Supported by:
    National Natural Science Foundation of China(61902227);Scientific and Technological Innovation Program of Higher Education Institutions in Shanxi Province(2019L0039);Natural Science Foundation of Shanxi Province(201901D211192)

结合图自编码器与聚类的半监督表示学习方法

杜航原1, 郝思聪1, 王文剑1,2()   

  1. 1.山西大学 计算机与信息技术学院,太原 030006
    2.计算智能与中文信息处理教育部重点实验室(山西大学),太原 030006
  • 通讯作者: 王文剑
  • 作者简介:杜航原(1985—),男,山西太原人,副教授,博士,CCF会员,主要研究方向:聚类分析、复杂网络;
    郝思聪(1995—),女,山西高平人,硕士研究生,主要研究方向:机器学习、网络数据挖掘;
  • 基金资助:
    国家自然科学基金资助项目(61902227);山西省高等学校科技创新项目(2019L0039);山西省自然科学基金资助项目(201901D211192)

Abstract:

Node label is widely existed supervision information in complex networks, and it plays an important role in network representation learning. Based on this fact, a Semi-supervised Representation Learning method combining Graph Auto-Encoder and Clustering (GAECSRL) was proposed. Firstly, the Graph Convolutional Network (GCN) and inner product function were used as the encoder and the decoder respectively, and the graph auto-encoder was constructed to form an information dissemination framework. Then, the k-means clustering module was added to the low-dimensional representation generated by the encoder, so that the training process of the graph auto-encoder and the category classification of the nodes were used to form a self-supervised mechanism. Finally, the category classification of the low-dimensional representation of the network was guided by using the discriminant information of the node labels. The network representation generation, category classification, and the training of the graph auto-encoder were built into a unified optimization model, and an effective network representation result that integrates node label information was obtained. In the simulation experiment, the GAECSRL method was used for node classification and link prediction tasks. Experimental results show that compared with DeepWalk, node2vec, learning Graph Representations with global structural information (GraRep), Structural Deep Network Embedding (SDNE) and Planetoid (Predicting labels and neighbors with embeddings transductively or inductively from data), GAECSRL has the Micro?F1 index increased by 0.9 to 24.46 percentage points, and the Macro?F1 index increased by 0.76 to 24.20 percentage points in the node classification task; in the link prediction task, GAECSRL has the AUC (Area under Curve) index increased by 0.33 to 9.06 percentage points, indicating that the network representation results obtained by GAECSRL effectively improve the performance of node classification and link prediction tasks.

Key words: network representation learning, network embedding, node label, graph neural network, self-supervised mechanism

摘要:

节点标签是复杂网络中广泛存在的监督信息,对网络表示学习具有重要作用。基于此,提出了一种结合图自编码器与聚类的半监督表示学习方法(GAECSRL)。首先,以图卷积网络(GCN)和内积函数分别作为编码器和解码器,并构建图自编码器以形成信息传播框架;然后,在编码器生成的低维表示基础上增加k-means聚类模块,从而使图自编码器的训练过程和节点的类别分布划分形成自监督机制;最后,利用节点标签的判别信息对网络低维表示的类别划分进行指导,将网络表示生成、类别划分以及图自编码器的训练构建在一个统一的优化模型中,并获得融合节点标签信息的有效网络表示结果。在仿真实验中,将GAECSRL用于节点分类和链接预测任务。实验结果表明,相比DeepWalk、node2vec、全局结构信息图表示学习(GraRep)、结构化深度网络嵌入(SDNE)和用数据的转导式或归纳式嵌入预测标签和邻居(Planetoid),在节点分类任务中GAECSRL的Micro?F1指标提高了0.9~24.46个百分点,Macro?F1指标提高了0.76~24.20个百分点;在链接预测任务中,GAECSRL的AUC指标提高了0.33~9.06个百分点,说明GAECSRL获得的网络表示结果能有效提高节点分类和链接预测任务的性能。

关键词: 网络表示学习, 网络嵌入, 节点标签, 图神经网络, 自监督机制

CLC Number: