Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (1): 1-8.DOI: 10.11772/j.issn.1001-9081.2021071221

• Artificial intelligence •     Next Articles

Unsupervised attributed graph embedding model based on node similarity

Yang LI1, Anbiao WU1, Ye YUAN2(), Linlin ZHAO1, Guoren WANG2   

  1. 1.College of Computer Science and Engineering,Northeastern University,Shenyang Liaoning 110169,China
    2.School of Computer Science and Technology,Beijing Institute of Technology,Beijing 100081,China
  • Received:2021-07-14 Revised:2021-09-03 Accepted:2021-09-15 Online:2021-09-03 Published:2022-01-10
  • Contact: Ye YUAN
  • About author:LI Yang, born in 1998, M. S. candidate. His research interests include graph neural network, graph representation learning.
    WU Anbiao, born in 1993, Ph. D. candidate. His research interests include graph database, graph neural network.
    YUAN Ye, born in 1981, Ph. D., professor. His research interests include big data management, database.
    ZHAO Linlin, born in 1997, M. S. candidate. Her research interests include graph representation learning, location-based social network.
    WANG Guoren, born in 1966, Ph. D., professor. His research interests include uncertain data management, data intensive computing, visual media data management and analysis, unstructured data management, distributed query processing and optimization, bioinformatics.
  • Supported by:
    National Natural Science Foundation of China(61932004);Fundamental Research Funds for the Central Universities(N181605012)

基于节点相似度的无监督属性图嵌入模型

李扬1, 吴安彪1, 袁野2(), 赵琳琳1, 王国仁2   

  1. 1.东北大学 计算机科学与工程学院,沈阳 110169
    2.北京理工大学 计算机学院,北京 100081
  • 通讯作者: 袁野
  • 作者简介:李扬(1998—),男,黑龙江勃利人,硕士研究生,CCF会员,主要研究方向:图神经网络、图表示学习
    吴安彪(1993—),男,河南商丘人,博士研究生,CCF会员,主要研究方向:图数据库、图神经网络
    袁野(1981—),男,辽宁沈阳人,教授,博士,CCF高级会员,主要研究方向:大数据管理、数据库
    赵琳琳(1997—),女,河南焦作人,硕士研究生,CCF会员,主要研究方向:图表示学习、位置社交网络
    王国仁(1966—),男,辽宁沈阳人,教授,博士,CCF杰出会员,主要研究方向:不确定数据管理、数据密集型计算、可视媒体数据管理与分析、非结构化数据管理、分布式查询处理与优化、生物信息学。
  • 基金资助:
    国家自然科学基金资助项目(61932004);中央高校基本科研业务费专项(N181605012)

Abstract:

Attributed graph embedding aims to represent the nodes in an attributed graph into low-dimensional vectors while preserving the topology information and attribute information of the nodes. There are lots of works related to attributed graph embedding. However, most of algorithms proposed in them are supervised or semi-supervised. In practical applications, the number of nodes that need to be labeled is large, which makes these algorithms difficult and consume huge manpower and material resources. Above problems were reanalyzed from an unsupervised perspective, and an unsupervised attributed graph embedding algorithm was proposed. Firstly, the topology information and attribute information of the nodes were calculated respectively by using the existing non-attributed graph embedding algorithm and attributes of the attributed graph. Then, the embedding vector of the nodes was obtained by using Graph Convolutional Network (GCN), and the difference between the embedding vector and the topology information and the difference between the embedding vector and attribute information were minimized. Finally, similar embeddings was obtained by the paired nodes with similar topological information and attribute information. Compared with Graph Auto-Encoder (GAE) method, the proposed method has the node classification accuracy improved by 1.2 percentage points and 2.4 percentage points on Cora and Citeseer datasets respectively. Experimental results show that the proposed method can effectively improve the quality of the generated embedding.

Key words: attributed graph embedding, Graph Convolution Network (GCN), node classification, node similarity, unsupervised

摘要:

属性图嵌入旨在将属性图中的节点表示为低维向量,并同时保留节点的拓扑信息和属性信息。属性图嵌入已经有一系列相关工作,然而它们大多数提出的是有监督或半监督的算法。在实际应用中,需要标记的节点数量多,导致这些属性图嵌入算法的难度大,且需要消耗巨大的人力物力。针对上述问题以无监督的视角重新分析,提出了一种无监督的属性图嵌入算法。首先,通过已存在的无属性图嵌入算法和属性图的属性分别计算节点的拓扑信息和属性信息;其次,利用图卷积网络(GCN)得到节点的嵌入向量,并使得嵌入向量与拓扑信息以及嵌入向量与属性信息的差最小;最终,使拓扑信息和属性信息都相似的成对节点得到相似嵌入。与图自动编码器(GAE)方法相比,所提出的方法在Cora、Citeseer数据集上的节点分类准确率分别提升了1.2个百分点和2.4个百分点。实验结果表明,所提出的方法能够有效提高生成的嵌入的质量。

关键词: 属性图嵌入, 图卷积网络, 节点分类, 节点相似度, 无监督

CLC Number: