《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (10): 3054-3061.DOI: 10.11772/j.issn.1001-9081.2022101494

• 人工智能 • 上一篇    下一篇

基于优化图结构自编码器的网络表示学习

富坤(), 郝玉涵, 孙明磊, 刘赢华   

  1. 河北工业大学 人工智能与数据科学学院,天津 300401
  • 收稿日期:2022-10-12 修回日期:2023-02-10 接受日期:2023-02-15 发布日期:2023-04-14 出版日期:2023-10-10
  • 通讯作者: 富坤
  • 作者简介:郝玉涵(1997—),女,河北张家口人,硕士研究生,主要研究方向:网络表示学习
    孙明磊(1992—),男,河北承德人,硕士研究生,主要研究方向:网络表示学习
    刘赢华(1996—),男,河北邯郸人,硕士研究生,主要研究方向:社会网络分析。
  • 基金资助:
    国家自然科学基金资助项目(62072154)

Network representation learning based on autoencoder with optimized graph structure

Kun FU(), Yuhan HAO, Minglei SUN, Yinghua LIU   

  1. School of Artificial Intelligence,Hebei University of Technology,Tianjin 300401,China
  • Received:2022-10-12 Revised:2023-02-10 Accepted:2023-02-15 Online:2023-04-14 Published:2023-10-10
  • Contact: Kun FU
  • About author:HAO Yuhan, born in 1997, M. S. candidate. Her research interests include network representation learning.
    SUN Minglei, born in 1992, M. S. candidate. His research interests include network representation learning.
    LIU Yinghua, born in 1996, M. S. candidate. His research interests include social network analysis.
  • Supported by:
    National Natural Science Foundation of China(62072154)

摘要:

网络表示学习(NRL)旨在学习网络顶点的潜在、低维表示,再将得到的表示用于下游的网络分析任务。针对现有采用自编码器的NRL算法不能充分提取节点属性信息,学习时容易产生信息偏差从而影响学习效果的问题,提出一种基于优化图结构自编码器的网络表示学习模型(NR-AGS),通过优化图结构的方式提高准确率。首先,融合结构和属性信息来生成结构和属性联合转移矩阵,进而形成高维表示;其次,利用自编码器学习低维嵌入表示;最后,通过在学习过程中加入深度嵌入聚类算法,对自编码器的训练过程和节点的类别分布划分形成自监督机制,并且通过改进的最大均值差异(MMD)算法减小学习得到的低维嵌入潜在表示层分布和原始数据分布的差距。此外,NR-AGS使用自编码器的重构损失、深度嵌入聚类损失和改进的MMD损失共同优化网络。应用NR-AGS对3个真实数据集进行学习,再使用得到的低维表示完成下游的节点分类和节点聚类任务。实验结果表明,与深度图表示模型DNGR(Deep Neural networks for Graph Representations)相比,NR-AGS在Cora、Citeseer、Wiki数据集上的Micro-F1值分别至少提升了7.2、13.5和8.2个百分点。可见,NR-AGS可以有效提升NRL的学习效果。

关键词: 网络表示学习, 属性信息, 自编码器, 深度嵌入聚类, 最大均值差异

Abstract:

The aim of Network Representation Learning (NRL) is to learn the potential and low-dimensional representation of network vertices, and the obtained representation is applied for downstream network analysis tasks. The existing NRL algorithms using autoencoder extract information about node attributes insufficiently and are easy to generate information bias, which affects the learning effect. Aiming at these problems, a Network Representation learning model based on Autoencoder with optimized Graph Structure (NR-AGS) was proposed to improve the accuracy by optimizing the graph structure. Firstly, the structure and attribute information were fused to generate the joint transition matrix, thereby forming the high-dimensional representation. Secondly, the low-dimensional embedded representation was learnt by an autoencoder. Finally, the deep embedded clustering algorithm was introduced during learning to form a self-supervision mechanism in the processes of autoencoder training and the category distribution division of nodes. At the same time, the improved Maximum Mean Discrepancy (MMD) algorithm was used to reduce the gap between distribution of the learnt low-dimensional embedded representation and distribution of the original data. Besides, in the proposed model, the reconstruction loss of the autoencoder, the deep embedded clustering loss and the improved MMD loss were used to optimize the network jointly. NR-AGS was applied to the learning of three real datasets, and the obtained low-dimensional representation was used for downstream tasks such as node classification and node clustering. Experimental results show that compared with the deep graph representation model DNGR (Deep Neural networks for Graph Representations), NR-AGS improves the Micro-F1 score by 7.2, 13.5 and 8.2 percentage points at least and respectively on Cora, Citeseer and Wiki datasets. It can be seen that NR-AGS can improve the learning effect of NRL effectively.

Key words: Network Representation Learning (NRL), attribute information, autoencoder, deep embedded clustering, Maximum Mean Discrepancy (MMD)

中图分类号: