Unsupervised attributed graph embedding model based on node similarity

doi:10.11772/j.issn.1001-9081.2021071221

Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (1): 1-8.DOI: 10.11772/j.issn.1001-9081.2021071221

Special Issue: 人工智能

• Artificial intelligence • Next Articles

Unsupervised attributed graph embedding model based on node similarity

Yang LI¹, Anbiao WU¹, Ye YUAN²(), Linlin ZHAO¹, Guoren WANG²

^1.College of Computer Science and Engineering，Northeastern University，Shenyang Liaoning 110169，China
^2.School of Computer Science and Technology，Beijing Institute of Technology，Beijing 100081，China

Received:2021-07-14 Revised:2021-09-03 Accepted:2021-09-15 Online:2021-09-03 Published:2022-01-10
Contact: Ye YUAN
About author:LI Yang， born in 1998， M. S. candidate. His research interests include graph neural network， graph representation learning.
WU Anbiao， born in 1993， Ph. D. candidate. His research interests include graph database， graph neural network.
YUAN Ye， born in 1981， Ph. D.， professor. His research interests include big data management， database.
ZHAO Linlin， born in 1997， M. S. candidate. Her research interests include graph representation learning， location-based social network.
WANG Guoren， born in 1966， Ph. D.， professor. His research interests include uncertain data management， data intensive computing， visual media data management and analysis， unstructured data management， distributed query processing and optimization， bioinformatics.
Supported by:
National Natural Science Foundation of China(61932004);Fundamental Research Funds for the Central Universities(N181605012)

基于节点相似度的无监督属性图嵌入模型

李扬¹, 吴安彪¹, 袁野²(), 赵琳琳¹, 王国仁²

^1.东北大学计算机科学与工程学院，沈阳 110169
^2.北京理工大学计算机学院，北京 100081

通讯作者: 袁野
作者简介:李扬（1998—），男，黑龙江勃利人，硕士研究生，CCF会员，主要研究方向：图神经网络、图表示学习
吴安彪（1993—），男，河南商丘人，博士研究生，CCF会员，主要研究方向：图数据库、图神经网络
袁野（1981—），男，辽宁沈阳人，教授，博士，CCF高级会员，主要研究方向：大数据管理、数据库
赵琳琳（1997—），女，河南焦作人，硕士研究生，CCF会员，主要研究方向：图表示学习、位置社交网络
王国仁（1966—），男，辽宁沈阳人，教授，博士，CCF杰出会员，主要研究方向：不确定数据管理、数据密集型计算、可视媒体数据管理与分析、非结构化数据管理、分布式查询处理与优化、生物信息学。
基金资助:
国家自然科学基金资助项目(61932004);中央高校基本科研业务费专项(N181605012)

Abstract

Abstract:

Attributed graph embedding aims to represent the nodes in an attributed graph into low-dimensional vectors while preserving the topology information and attribute information of the nodes. There are lots of works related to attributed graph embedding. However， most of algorithms proposed in them are supervised or semi-supervised. In practical applications， the number of nodes that need to be labeled is large， which makes these algorithms difficult and consume huge manpower and material resources. Above problems were reanalyzed from an unsupervised perspective， and an unsupervised attributed graph embedding algorithm was proposed. Firstly， the topology information and attribute information of the nodes were calculated respectively by using the existing non-attributed graph embedding algorithm and attributes of the attributed graph. Then， the embedding vector of the nodes was obtained by using Graph Convolutional Network （GCN）， and the difference between the embedding vector and the topology information and the difference between the embedding vector and attribute information were minimized. Finally， similar embeddings was obtained by the paired nodes with similar topological information and attribute information. Compared with Graph Auto-Encoder （GAE） method， the proposed method has the node classification accuracy improved by 1.2 percentage points and 2.4 percentage points on Cora and Citeseer datasets respectively. Experimental results show that the proposed method can effectively improve the quality of the generated embedding.

Key words: attributed graph embedding, Graph Convolution Network (GCN), node classification, node similarity, unsupervised

摘要：

属性图嵌入旨在将属性图中的节点表示为低维向量，并同时保留节点的拓扑信息和属性信息。属性图嵌入已经有一系列相关工作，然而它们大多数提出的是有监督或半监督的算法。在实际应用中，需要标记的节点数量多，导致这些属性图嵌入算法的难度大，且需要消耗巨大的人力物力。针对上述问题以无监督的视角重新分析，提出了一种无监督的属性图嵌入算法。首先，通过已存在的无属性图嵌入算法和属性图的属性分别计算节点的拓扑信息和属性信息；其次，利用图卷积网络（GCN）得到节点的嵌入向量，并使得嵌入向量与拓扑信息以及嵌入向量与属性信息的差最小；最终，使拓扑信息和属性信息都相似的成对节点得到相似嵌入。与图自动编码器（GAE）方法相比，所提出的方法在Cora、Citeseer数据集上的节点分类准确率分别提升了1.2个百分点和2.4个百分点。实验结果表明，所提出的方法能够有效提高生成的嵌入的质量。

关键词: 属性图嵌入, 图卷积网络, 节点分类, 节点相似度, 无监督

CLC Number:

TP399

Yang LI, Anbiao WU, Ye YUAN, Linlin ZHAO, Guoren WANG. Unsupervised attributed graph embedding model based on node similarity[J]. Journal of Computer Applications, 2022, 42(1): 1-8.

李扬, 吴安彪, 袁野, 赵琳琳, 王国仁. 基于节点相似度的无监督属性图嵌入模型[J]. 《计算机应用》唯一官方网站, 2022, 42(1): 1-8.

Figures/Tables 8

Tab. 1 Main parameters

变量	定义
$G$	属性图
$V$	属性图的点集
$E$	属性图的边集
$V$	属性图的节点个数
$A$	属性图的属性矩阵
$v i$	第 $i$ 个节点
$A i$	第 $i$ 个节点的属性向量
$H$	属性图嵌入
$H i$	第 $i$ 个节点的嵌入向量
$H T$	属性图的拓扑嵌入
$H T i$	第 $i$ 个节点的拓扑嵌入
$S$	属性图的相似度矩阵
$S T$	属性图的拓扑相似度矩阵
$S A$	属性图的属性相似度矩阵
α	超参数
d	嵌入向量维数
m	属性向量维数

Tab. 1 Main parameters

变量	定义
$G$	属性图
$V$	属性图的点集
$E$	属性图的边集
$V$	属性图的节点个数
$A$	属性图的属性矩阵
$v i$	第 $i$ 个节点
$A i$	第 $i$ 个节点的属性向量
$H$	属性图嵌入
$H i$	第 $i$ 个节点的嵌入向量
$H T$	属性图的拓扑嵌入
$H T i$	第 $i$ 个节点的拓扑嵌入
$S$	属性图的相似度矩阵
$S T$	属性图的拓扑相似度矩阵
$S A$	属性图的属性相似度矩阵
α	超参数
d	嵌入向量维数
m	属性向量维数

Fig. 1 Unsupervised attributed graph embedding model

Tab. 2 Dataset statistical information

数据集	节点数	边数	类别数	特征数
Cora	2 708	5 429	7	1 433
Citeseer	3 327	4 732	6	3 703
Pubmed	19 717	44 338	3	500

Tab. 3 Accuracy comparison of node classification task on different datasets

方法	Cora	Citeseer	Pubmed
Raw features	47.9	49.4	69.1
DeepWalk	67.2	43.2	65.3
LP	68.0	45.3	63.0
DeepWalk+features	70.7	51.4	74.3
VGAE	72.4	55.7	71.6
GAE	80.5	69.1	78.1
GraphSAGE-LSTM	50.1	40.3	77.1
GraphSAGE-pool	57.5	45.9	79.9
GraphSAGE-mean	67.0	52.8	79.3
GraphSAGE-GCN	74.3	54.5	77.5
本文方法	81.7	71.5	79.0

Fig. 2 Visualization result of DeepWalk algorithm on Cora dataset

Fig. 3 Visualization result of the proposed method on Cora dataset

Fig. 4 Node classification accuracy with different hyperparameter α

Fig. 5 Node classification accuracy with different embedding vector dimension d

References 30

1	FREEMAN L C. Visualizing social networks［J］. Journal of Social Structure， 2000， 1： No.1.
2	KIPF T N， WELLING M. Semi-supervised classification with graph convolutional networks［EB/OL］. （2017-02-22）［2021-06-01］..
3	BELKIN M， NIYOGI P. Laplacian eigenmaps and spectral techniques for embedding and clustering［C］// Proceedings of the 14th International Conference on Neural Information Processing Systems. Cambridge： MIT Press， 2001： 585-591. 10.7551/mitpress/1120.003.0080
4	HE X F， NIYOGI P. Locality preserving projections［C］// Proceedings of the 16th International Conference on Neural Information Processing Systems. Cambridge： MIT Press， 2003： 153-160.
5	AHMED A， SHERVASHIDZE N， NARAYANAMURTHY S， et al. Distributed large-scale natural graph factorization［C］// Proceedings of the 22nd International Conference on World Wide Web. New York： ACM， 2013： 37-48. 10.1145/2488388.2488393
6	CAO S S， LU W， XU Q K. GraRep： learning graph representations with global structural information［C］// Proceedings of the 24th ACM International Conference on Information and Knowledge Management. New York： ACM， 2015： 891-900. 10.1145/2806416.2806512
7	OU M D， CUI P， PEI J， et al. Asymmetric transitivity preserving graph embedding［C］// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2016： 1105-1114. 10.1145/2939672.2939751
8	PEROZZI B， Al-RFOU R， SKIENA S. DeepWalk： online learning of social representations［C］// Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2014： 701-710. 10.1145/2623330.2623732
9	GROVER A， LESKOVEC J. node2vec： scalable feature learning for networks［C］// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2016： 855-864. 10.1145/2939672.2939754
10	PEROZZI B， KULKARNI V， CHEN H C， et al. Don’t walk， skip！ online learning of multi-scale network embeddings［EB/OL］. （2017-06-24）［2021-06-01］.. 10.1145/3110025.3110086
11	CHAMBERLAIN B P， CLOUGH J R， DEISENROTH M P. Neural embeddings of graphs in hyperbolic space ［EB/OL］. （2017-05-29）［2021-06-01］..
12	TIAN F， GAO B， CUI Q， et al. Learning deep representations for graph clustering［C］// Proceedings of the 28th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2014：1293-1299.
13	WANG D X， CUI P， ZHU W W. Structural deep network embedding［C］// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2016： 1225-1234. 10.1145/2939672.2939753
14	CAO S S， LU W， XU Q K. Deep neural networks for learning graph representations［C］// Proceedings of the 30th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2016：1145-1152.
15	TANG J， QU M， WANG M Z， et al. LINE： large-scale information network embedding［C］// Proceedings of the 24th International Conference on World Wide Web. Republic and Canton of Geneva： International World Wide Web Conferences Steering Committee， 2015： 1067-1077. 10.1145/2736277.2741093
16	祁志卫，王笳辉，岳昆，等. 图嵌入方法与应用：研究综述［J］. 电子学报， 2020， 48（4）：808-818. 10.3969/j.issn.0372-2112.2020.04.023
	QI Z W， WANG J H， YUE K， et al. Methods and applications of graph embedding： a survey［J］. Acta Electronica Sinica， 2020， 48（4）：808-818. 10.3969/j.issn.0372-2112.2020.04.023
17	HUANG X， LI J D， HU X. Label informed attributed network embedding［C］// Proceedings of the 10th ACM International Conference on Web Search and Data Mining. New York： ACM， 2017： 731-739. 10.1145/3018661.3018667
18	HAMILTON W L， YING R， LESKOVEC J. Inductive representation learning on large graphs［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017： 1025-1035. 10.1145/3219819.3219890
19	VELIČKOVIĆ P， CUCURULL G， CASANOVA A， et al. Graph attention networks［EB/OL］. （2018-02-04）［2021-06-01］..
20	YANG C， LIU Z Y， ZHAO D L， et al. Network representation learning with rich text information［C］// Proceedings of the 24th International Joint Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2015： 2111-2117.
21	HUANG X， LI J D， HU X. Accelerated attributed network embedding［C］// Proceedings of the 2017 SIAM International Conference on Data Mining. Philadelphia， PA： SIAM， 2017： 633-641. 10.1137/1.9781611974973.71
22	KIPF T N， WELLING M. Variational graph auto-encoders［EB/OL］. （2016-11-21）［2021-06-01］..
23	PAN S R， HU R Q， LONG G D， et al. Adversarially regularized graph autoencoder for graph embedding［C］// Proceedings of the 27th International Joint Conference on Artificial Intelligence. California： ijcai.org， 2018： 2609-2615. 10.24963/ijcai.2018/362
24	WANG C， PAN S R， LONG G D， et al. MGAE： marginalized graph autoencoder for graph clustering［C］// Proceedings of the 2017 ACM Conference on Information and Knowledge Management. New York： ACM， 2017： 889-898. 10.1145/3132847.3132967
25	GAO H C， HUANG H. Deep attributed network embedding［C］// Proceedings of the 27th International Joint Conference on Artificial Intelligence. California： ijcai.org， 2018：3364-3370. 10.24963/ijcai.2018/467
26	MONTI F， BOSCAINI D， MASCI J， et al. Geometric deep learning on graphs and manifolds using mixture model CNNs［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 5425-5434. 10.1109/cvpr.2017.576
27	KINGMA D P， BA J L. Adam： a method for stochastic optimization［EB/OL］. （2017-01-30）［2021-06-01］..
28	MIKOLOV T， SUTSKEVER I， CHEN K， et al. Distributed representations of words and phrases and their compositionality［C］// Proceedings of the 26th International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2013： 3111-3119.
29	ZHU X J， GHAHRAMANI Z. Learning from labeled and unlabeled data with label propagation［R］. Pittsburgh， PA： Carnegie Mellon University， 2002：237-244.
30	VELIČKOVIĆ P， FEDUS W， HAMILTON W L， et al. Deep graph infomax［EB/OL］. ［2021-06-01］..

[1]	Yuxin HUANG, Jialong XU, Zhengtao YU, Shukai HOU, Jiaqi ZHOU. Unsupervised text sentiment transfer method based on generation prompt [J]. Journal of Computer Applications, 2024, 44(9): 2667-2673.
[2]	Jieru JIA, Jianchao YANG, Shuorui ZHANG, Tao YAN, Bin CHEN. Unsupervised person re-identification based on self-distilled vision Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2893-2902.
[3]	Shibin LI, Jun GONG, Shengjun TANG. Semi-supervised heterophilic graph representation learning model based on Graph Transformer [J]. Journal of Computer Applications, 2024, 44(6): 1816-1823.
[4]	Wenping ZHENG, Huilin GE, Meilin LIU, Gui YANG. Node classification algorithm fusing 2-connected motif-structure information [J]. Journal of Computer Applications, 2024, 44(5): 1464-1470.
[5]	Xiawuji, Heming HUANG, Gengzangcuomao, Yutao FAN. Survey of extractive text summarization based on unsupervised learning and supervised learning [J]. Journal of Computer Applications, 2024, 44(4): 1035-1048.
[6]	Junjie ZHU, Li YU, Shengwen LI, Changzheng ZHOU. Technology term recognition with comprehensive constituency parsing [J]. Journal of Computer Applications, 2024, 44(4): 1072-1079.
[7]	Dapeng XU, Xinmin HOU. Feature selection method for graph neural network based on network architecture design [J]. Journal of Computer Applications, 2024, 44(3): 663-670.
[8]	Rui JIANG, Wei LIU, Cheng CHEN, Tao LU. Asymmetric unsupervised end-to-end image deraining network [J]. Journal of Computer Applications, 2024, 44(3): 922-930.
[9]	Jingxin LIU, Wenjing HUANG, Liangsheng XU, Chong HUANG, Jiansheng WU. Unsupervised feature selection model with dictionary learning and sample correlation preservation [J]. Journal of Computer Applications, 2024, 44(12): 3766-3775.
[10]	Yongjiang LIU, Bin CHEN. Pixel-level unsupervised industrial anomaly detection based on multi-scale memory bank [J]. Journal of Computer Applications, 2024, 44(11): 3587-3594.
[11]	Pei ZHAO, Yan QIAO, Rongyao HU, Xinyu YUAN, Minyue LI, Benchu ZHANG. Multivariate time series anomaly detection based on multi-domain feature extraction [J]. Journal of Computer Applications, 2024, 44(11): 3419-3426.
[12]	Nengbing HU, Biao CAI, Xu LI, Danhua CAO. Graph classification method based on graph pooling contrast learning [J]. Journal of Computer Applications, 2024, 44(11): 3327-3334.
[13]	Wei TONG, Liyang HE, Rui LI, Wei HUANG, Zhenya HUANG, Qi LIU. Efficient similar exercise retrieval model based on unsupervised semantic hashing [J]. Journal of Computer Applications, 2024, 44(1): 206-216.
[14]	Tian HE, Zongxin SHEN, Qianqian HUANG, Yanyong HUANG. Adaptive learning-based multi-view unsupervised feature selection method [J]. Journal of Computer Applications, 2023, 43(9): 2657-2664.
[15]	Li XU, Xiangyuan FU, Haoran LI. Spatial-temporal traffic flow prediction model based on gated convolution [J]. Journal of Computer Applications, 2023, 43(9): 2760-2765.

Unsupervised attributed graph embedding model based on node similarity

基于节点相似度的无监督属性图嵌入模型

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 8

References 30

Related Articles 15

Recommended Articles

Metrics