Semi-supervised representation learning method combining graph auto-encoder and clustering

doi:10.11772/j.issn.1001-9081.2021071354

Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (9): 2643-2651.DOI: 10.11772/j.issn.1001-9081.2021071354

Special Issue: 人工智能

• Artificial intelligence • Next Articles

Semi-supervised representation learning method combining graph auto-encoder and clustering

Hangyuan DU¹, Sicong HAO¹, Wenjian WANG¹^,²()

^1.School of Computer and Information Technology，Shanxi University，Taiyuan Shanxi 030006，China
^2.Key Laboratory Computational Intelligence and Chinese Information Processing of Ministry of Education （Shanxi University），Taiyuan Shanxi 030006，China

Received:2021-07-28 Revised:2021-10-18 Accepted:2021-10-21 Online:2021-11-10 Published:2022-09-10
Contact: Wenjian WANG
About author:DU Hangyuan， born in 1985， Ph. D.， associate professor. His research interests include cluster analysis， complex network.
HAO Sicong， born in 1995， M. S. candidate. Her research interests include machine learning， network data mining.
Supported by:
National Natural Science Foundation of China(61902227);Scientific and Technological Innovation Program of Higher Education Institutions in Shanxi Province(2019L0039);Natural Science Foundation of Shanxi Province(201901D211192)

结合图自编码器与聚类的半监督表示学习方法

杜航原¹, 郝思聪¹, 王文剑¹^,²()

^1.山西大学计算机与信息技术学院，太原 030006
^2.计算智能与中文信息处理教育部重点实验室（山西大学），太原 030006

通讯作者: 王文剑
作者简介:杜航原（1985—），男，山西太原人，副教授，博士，CCF会员，主要研究方向：聚类分析、复杂网络；
郝思聪（1995—），女，山西高平人，硕士研究生，主要研究方向：机器学习、网络数据挖掘；
基金资助:
国家自然科学基金资助项目(61902227);山西省高等学校科技创新项目(2019L0039);山西省自然科学基金资助项目(201901D211192)

Abstract

Abstract:

Node label is widely existed supervision information in complex networks， and it plays an important role in network representation learning. Based on this fact， a Semi-supervised Representation Learning method combining Graph Auto-Encoder and Clustering （GAECSRL） was proposed. Firstly， the Graph Convolutional Network （GCN） and inner product function were used as the encoder and the decoder respectively， and the graph auto-encoder was constructed to form an information dissemination framework. Then， the k-means clustering module was added to the low-dimensional representation generated by the encoder， so that the training process of the graph auto-encoder and the category classification of the nodes were used to form a self-supervised mechanism. Finally， the category classification of the low-dimensional representation of the network was guided by using the discriminant information of the node labels. The network representation generation， category classification， and the training of the graph auto-encoder were built into a unified optimization model， and an effective network representation result that integrates node label information was obtained. In the simulation experiment， the GAECSRL method was used for node classification and link prediction tasks. Experimental results show that compared with DeepWalk， node2vec， learning Graph Representations with global structural information （GraRep）， Structural Deep Network Embedding （SDNE） and Planetoid （Predicting labels and neighbors with embeddings transductively or inductively from data）， GAECSRL has the Micro?F1 index increased by 0.9 to 24.46 percentage points， and the Macro?F1 index increased by 0.76 to 24.20 percentage points in the node classification task； in the link prediction task， GAECSRL has the AUC （Area under Curve） index increased by 0.33 to 9.06 percentage points， indicating that the network representation results obtained by GAECSRL effectively improve the performance of node classification and link prediction tasks.

Key words: network representation learning, network embedding, node label, graph neural network, self-supervised mechanism

摘要：

节点标签是复杂网络中广泛存在的监督信息，对网络表示学习具有重要作用。基于此，提出了一种结合图自编码器与聚类的半监督表示学习方法（GAECSRL）。首先，以图卷积网络（GCN）和内积函数分别作为编码器和解码器，并构建图自编码器以形成信息传播框架；然后，在编码器生成的低维表示基础上增加k-means聚类模块，从而使图自编码器的训练过程和节点的类别分布划分形成自监督机制；最后，利用节点标签的判别信息对网络低维表示的类别划分进行指导，将网络表示生成、类别划分以及图自编码器的训练构建在一个统一的优化模型中，并获得融合节点标签信息的有效网络表示结果。在仿真实验中，将GAECSRL用于节点分类和链接预测任务。实验结果表明，相比DeepWalk、node2vec、全局结构信息图表示学习（GraRep）、结构化深度网络嵌入（SDNE）和用数据的转导式或归纳式嵌入预测标签和邻居（Planetoid），在节点分类任务中GAECSRL的Micro?F1指标提高了0.9~24.46个百分点，Macro?F1指标提高了0.76~24.20个百分点；在链接预测任务中，GAECSRL的AUC指标提高了0.33~9.06个百分点，说明GAECSRL获得的网络表示结果能有效提高节点分类和链接预测任务的性能。

关键词: 网络表示学习, 网络嵌入, 节点标签, 图神经网络, 自监督机制

CLC Number:

TP183

Hangyuan DU, Sicong HAO, Wenjian WANG. Semi-supervised representation learning method combining graph auto-encoder and clustering[J]. Journal of Computer Applications, 2022, 42(9): 2643-2651.

杜航原, 郝思聪, 王文剑. 结合图自编码器与聚类的半监督表示学习方法[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2643-2651.

Figures/Tables 7

References 21

1	孙金清，周慧，赵中英. 网络表示学习方法研究综述［J］. 山东科技大学学报（自然科学版）， 2021， 40（1）：117-128. 10.16452/j.cnki.sdkjzk.2021.01.014
	SUN J Q， ZHOU H， ZHAO Z Y.A survey of network representation learning methods［J］. Journal of Shandong University of Science and Technology （Natural Science）， 2021， 40（1）： 117-128. 10.16452/j.cnki.sdkjzk.2021.01.014
2	YANG C， LIU Z Y， ZHAO D L， et al. Network representation learning with rich text information［C］// Proceedings of the 24th International Joint Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2015：2111-2117. 10.1609/aaai.v29i1.9448
3	CAO S S， LU W， XU Q K. GraRep： learning graph representations with global structural information［C］// Proceedings of the 24th ACM International Conference on Information and Knowledge Management. New York： ACM， 2015： 891-900. 10.1145/2806416.2806512
4	OU M D， CUI P， PEI J， et al. Asymmetric transitivity preserving graph embedding［C］// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2016： 1105-1114. 10.1145/2939672.2939751
5	WANG X， CUI P， WANG J， et al. Community preserving network embedding［C］// Proceedings of the 31st AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2017：203-209. 10.1609/aaai.v31i1.10488
6	PEROZZI B， AL-RFOU R， SKIENA S. DeepWalk： online learning of social representations［C］// Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2014： 701-710. 10.1145/2623330.2623732
7	GROVER A， LESKOVEC J. Node2vec： scalable feature learning for networks［C］// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2016： 855-864. 10.1145/2939672.2939754
8	WANG D X， CUI P， ZHU W W. Structural deep network embedding［C］// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2016： 1225-1234. 10.1145/2939672.2939753
9	CAO S S， LU W， XU Q K. Deep neural networks for learning graph representations［C］// Proceedings of the 30th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2016： 1145-1152. 10.1609/aaai.v30i1.10179
10	WANG H W， WANG J， WANG J L， et al. GraphGAN： graph representation learning with generative adversarial nets［C］// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2018：2508-2515. 10.1609/aaai.v32i1.11872
11	KIPF T N， WELLING M. Semi-supervised classification with graph convolutional networks［EB/OL］. （2017-02-22）［2021-07-14］.. 10.48550/arXiv.1609.02907
12	YANG Z L， COHEN W， SALAKHUTDINOV R. Revisiting semi-supervised learning with graph embeddings［C］// Proceedings of the 33rd International Conference on Machine Learning. New York： JMLR.org， 2016： 40-48.
13	ZHANG X， CHEN W Z， YAN H F. TLINE： scalable transductive network embedding［C］// Proceedings of the 2016 Asia Information Retrieval Symposium， LNCS 9994. Cham： Springer， 2016： 98-110.
14	ZHANG D K， YIN J， ZHU X Q， et al. Network representation learning： a survey［J］. IEEE Transactions on Big Data， 2020， 6（1）： 3-28. 10.1109/tbdata.2018.2850013
15	SPERDUTI A， STARITA A. Supervised neural networks for the classification of structures［J］. IEEE Transactions on Neural Networks， 1997， 8（3）： 714-735. 10.1109/72.572108
16	GORI M， MONFARDINI G， SCARSELLI F. A new model for learning in graph domains［C］// Proceedings of the 2005 IEEE International Joint Conference on Neural Networks. Piscataway： IEEE， 2005： 729-734.
17	SCARSELLI F， GORI M， TSOI A C， et al. The graph neural network model［J］. IEEE Transactions on Neural Networks， 2009， 20（1）： 61-80. 10.1109/tnn.2008.2005605
18	GALLICCHIO C， MICHELI A. Graph echo state networks［C］// Proceedings of the 2010 International Joint Conference on Neural Networks. Piscataway： IEEE， 2010： 1-8. 10.1109/ijcnn.2010.5596796
19	CUI P， WANG X， PEI J， et al. A survey on network embedding［J］. IEEE Transactions on Knowledge and Data Engineering， 2019， 31（5）： 833-852. 10.1109/tkde.2018.2849727
20	WU Z H， PAN S R， CHEN F W， et al. A comprehensive survey on graph neural networks［J］. IEEE Transactions on Neural Networks and Learning Systems， 2021， 32（1）： 4-24. 10.1109/tnnls.2020.2978386
21	KIPF T N， WELLING M. Variational graph auto-encoders［EB/OL］. （2016-11-21）［2021-07-15］..

数据集	类别数	节点数	边数	特征维度
Cora	7	2 708	5 429	1 433
CiteSeer	6	3 312	4 732	3 703
PubMed	3	19 717	44 338	500
Wiki	19	2 405	17 981	4 973

数据集	类别数	节点数	边数	特征维度
Cora	7	2 708	5 429	1 433
CiteSeer	6	3 312	4 732	3 703
PubMed	3	19 717	44 338	500
Wiki	19	2 405	17 981	4 973

实际结果	预测结果
实际结果	正例	反例
正例	真正例（TP）	假反例（FN）
反例	假正例（FP）	真反例（TN）

实际结果	预测结果
实际结果	正例	反例
正例	真正例（TP）	假反例（FN）
反例	假正例（FP）	真反例（TN）

数据集	方法	标记率									平均Micro⁃F1
数据集	方法	90%	80%	70%	60%	50%	40%	30%	20%	10%	平均Micro⁃F1
Cora	GAECSRL	84.85	84.63	83.22	82.32	82.17	81.94	81.29	80.87	77.74	82.11
	DeepWalk	83.55	83.01	82.97	82.72	82.53	81.52	80.26	78.61	75.75	81.21
	node2vec	82.84	82.42	82.08	81.90	81.85	81.55	80.42	79.07	75.83	80.88
	GraRep	81.92	80.15	79.46	79.40	79.39	79.14	79.06	78.45	74.16	79.01
	SDNE	79.23	78.64	78.16	77.12	76.56	76.29	75.25	73.36	69.77	76.04
	Planetoid	75.89	74.72	73.29	72.67	71.59	70.54	68.19	66.24	63.57	70.74
CiteSeer	GAECSRL	75.19	74.72	73.25	72.83	72.31	71.53	71.27	70.49	68.01	72.18
	DeepWalk	61.21	60.59	59.75	59.08	58.85	58.27	57.43	55.36	51.98	58.06
	node2vec	62.26	61.57	61.27	61.13	60.42	59.44	58.87	56.87	53.85	59.52
	GraRep	55.78	54.83	54.74	54.37	54.12	53.22	53.12	53.01	51.58	53.86
	SDNE	52.81	52.06	50.67	49.56	49.50	48.53	47.77	46.62	44.41	49.10
	Planetoid	65.52	65.59	64.55	64.48	63.64	62.76	61.15	59.66	57.40	62.75
Wiki	GAECSRL	78.21	76.32	75.24	74.45	74.23	73.69	71.58	70.14	68.17	73.56
	DeepWalk	61.21	60.59	59.75	59.08	58.85	58.27	57.43	55.36	51.98	58.06
	node2vec	62.26	61.57	61.27	61.13	60.42	59.44	58.87	56.87	53.85	59.52
	GraRep	55.78	54.83	54.74	54.37	54.12	53.22	53.12	53.01	51.58	53.86
	SDNE	52.81	52.06	50.67	49.56	49.50	48.53	47.77	46.62	44.41	49.10
	Planetoid	74.23	74.37	73.25	72.55	71.69	70.36	70.24	69.57	67.10	71.48
PubMed	GAECSRL	82.34	81.53	80.97	80.17	79.21	78.85	78.34	75.29	74.91	79.07
	DeepWalk	80.48	79.74	78.97	77.39	76.23	75.20	74.94	73.86	71.02	76.43
	node2vec	81.61	80.01	79.87	79.31	78.47	77.28	76.83	75.67	74.09	78.13
	GraRep	80.14	79.58	78.23	77.67	76.32	75.90	74.77	73.52	72.27	76.49
	SDNE	72.93	72.23	71.18	70.06	69.96	69.15	68.61	67.48	66.09	69.74
	Planetoid	77.54	77.13	76.27	75.92	74.90	73.45	72.30	71.81	70.98	74.48

Semi-supervised representation learning method combining graph auto-encoder and clustering

结合图自编码器与聚类的半监督表示学习方法

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 7

References 21

Related Articles 15

Recommended Articles

Metrics

数据集	方法	标记率									平均Macro-F1/%
数据集	方法	90%	80%	70%	60%	50%	40%	30%	20%	10%	平均Macro-F1/%
Cora	GAECSRL	84.14	83.62	82.69	82.47	81.35	81.04	80.13	79.23	75.33	81.11
	DeepWalk	82.41	82.35	82.30	82.04	81.94	80.67	79.34	77.57	74.55	80.35
	node2vec	81.79	81.59	81.49	81.45	81.35	81.11	79.90	78.58	74.58	80.20
	GraRep	79.75	79.42	79.29	79.14	79.07	78.57	78.48	77.94	73.12	78.31
	SDNE	47.86	47.49	46.53	45.34	45.00	43.35	42.67	41.50	38.53	44.25
	Planetoid	69.46	68.49	68.22	67.44	66.28	65.93	64.95	63.75	57.83	65.82
CiteSeer	GAECSRL	73.74	72.15	71.23	70.86	69.71	68.74	66.83	63.37	59.48	68.46
	DeepWalk	56.11	55.79	54.84	54.35	54.11	53.95	52.72	51.18	47.71	53.42
	node2vec	56.75	56.59	56.25	55.93	55.57	54.80	54.16	52.16	49.33	54.62
	GraRep	50.15	49.44	48.84	48.76	48.53	48.01	47.60	47.20	45.67	48.24
	SDNE	47.86	47.49	46.53	45.34	45.00	43.35	42.67	41.50	38.53	44.25
	Planetoid	71.35	71.01	70.43	69.35	68.19	67.85	66.78	65.62	58.74	67.70
Wiki	GAECSRL	81.28	81.01	80.58	79.55	78.43	77.76	77.88	77.50	76.67	78.96
	DeepWalk	79.29	78.66	77.92	77.49	77.24	76.88	75.58	75.46	74.50	77.00
	node2vec	79.92	79.59	79.36	78.62	77.89	77.02	76.37	76.26	75.70	77.86
	GraRep	78.09	77.28	77.06	76.78	76.37	76.29	75.31	75.1	74.52	76.31
	SDNE	70.66	69.57	69.35	69.21	68.82	68.24	67.81	67.33	67.14	68.68
	Planetoid	76.56	76.17	75.83	75.54	74.57	73.53	72.00	71.51	70.65	74.04
PubMed	GAECSRL	73.74	72.15	71.23	70.86	69.71	68.74	66.83	63.37	59.48	68.46
	DeepWalk	69.46	68.49	68.22	67.44	66.28	65.93	64.95	63.75	57.83	65.82
	node2vec	47.86	47.49	46.53	45.34	45.00	43.35	42.67	41.50	38.53	44.25
	GraRep	56.11	55.79	54.84	54.35	54.11	53.95	52.72	51.18	47.71	53.42
	SDNE	56.75	56.59	56.25	55.93	55.57	54.80	54.16	52.16	49.33	54.62
	Planetoid	50.15	49.44	48.84	48.76	48.53	48.01	47.60	47.20	45.67	48.24

数据集	方法	标记率									平均AUC
数据集	方法	90%	80%	70%	60%	50%	40%	30%	20%	10%	平均AUC
Cora	GAECSRL	86.43	85.19	84.06	82.35	81.30	81.55	80.54	79.68	76.17	81.92
	DeepWalk	85.18	84.23	83.45	82.43	81.77	81.13	80.07	79.42	76.61	81.59
	node2vec	85.75	84.36	83.84	82.58	81.49	80.07	79.73	78.61	75.85	81.36
	GraRep	84.52	83.28	82.45	81.24	79.92	79.20	78.19	76.65	74.72	80.02
	SDNE	81.47	80.43	79.35	78.46	77.38	76.49	75.56	74.58	71.79	77.28
	Planetoid	78.86	77.67	76.52	75.28	74.34	73.42	72.26	70.57	68.88	74.20
CiteSeer	GAECSRL	89.96	88.69	87.12	86.84	86.09	85.45	84.37	83.65	81.47	85.96
	DeepWalk	88.65	87.41	86.73	85.48	84.96	84.14	83.29	81.74	79.86	84.70
	node2vec	89.54	88.43	87.27	86.63	85.79	84.81	83.73	82.05	80.47	85.41
	GraRep	87.47	86.26	85.37	84.52	83.19	82.35	81.24	80.34	78.49	83.25
	SDNE	85.58	84.73	83.28	82.26	81.68	80.47	79.39	78.57	75.14	81.23
	Planetoid	81.67	80.64	79.34	78.56	77.42	76.89	75.26	73.94	71.48	77.24
Wiki	GAECSRL	88.74	87.52	86.31	85.27	84.34	82.04	81.79	80.76	78.63	83.93
	DeepWalk	87.42	86.16	85.76	84.61	83.32	82.96	81.37	79.82	77.56	83.22
	node2vec	86.67	85.27	84.71	83.19	82.49	81.64	80.38	79.29	76.72	82.26
	GraRep	85.39	84.37	83.58	82.31	81.50	80.53	79.47	78.28	75.44	81.21
	SDNE	82.33	81.56	80.27	79.23	78.45	77.30	76.14	74.47	72.65	78.04
	Planetoid	79.67	78.61	77.32	76.56	75.60	74.65	72.21	71.35	69.46	75.05
PubMed	GAECSRL	92.74	91.63	90.22	89.62	88.45	87.32	86.29	85.87	83.74	88.43
	DeepWalk	91.54	90.12	89.86	88.60	87.41	86.63	85.37	84.59	82.84	87.44
	node2vec	90.94	89.31	88.17	87.75	86.74	85.47	84.53	82.16	80.27	86.15
	GraRep	88.71	87.42	86.57	85.39	84.41	83.25	82.17	81.34	79.36	84.29
	SDNE	85.34	84.74	83.27	82.32	81.45	80.18	79.36	78.25	76.68	81.29
	Planetoid	83.78	82.50	81.39	80.56	79.46	78.45	77.20	76.35	74.68	79.37

[1]	Xingyao YANG, Yu CHEN, Jiong YU, Zulian ZHANG, Jiaying CHEN, Dongxiao WANG. Recommendation model combining self-features and contrastive learning [J]. Journal of Computer Applications, 2024, 44(9): 2704-2710.
[2]	Tingjie TANG, Jiajin HUANG, Jin QIN. Session-based recommendation with graph auxiliary learning [J]. Journal of Computer Applications, 2024, 44(9): 2711-2718.
[3]	Hang YANG, Wanggen LI, Gensheng ZHANG, Zhige WANG, Xin KAI. Multi-layer information interactive fusion algorithm based on graph neural network for session-based recommendation [J]. Journal of Computer Applications, 2024, 44(9): 2719-2725.
[4]	Yu DU, Yan ZHU. Constructing pre-trained dynamic graph neural network to predict disappearance of academic cooperation behavior [J]. Journal of Computer Applications, 2024, 44(9): 2726-2731.
[5]	Fan YANG, Yao ZOU, Mingzhi ZHU, Zhenwei MA, Dawei CHENG, Changjun JIANG. Credit card fraud detection model based on graph attention Transformation neural network [J]. Journal of Computer Applications, 2024, 44(8): 2634-2642.
[6]	Xinrui LIN, Xiaofei WANG, Yan ZHU. Academic anomaly citation group detection based on local extended community detection [J]. Journal of Computer Applications, 2024, 44(6): 1855-1861.
[7]	Jiong WANG, Taotao TANG, Caiyan JIA. PAGCL： positive augmentation graph contrastive learning recommendation method without negative sampling [J]. Journal of Computer Applications, 2024, 44(5): 1485-1492.
[8]	Jie GUO, Jiayu LIN, Zuhong LIANG, Xiaobo LUO, Haitao SUN. Recommendation method based on knowledge‑awareness and cross-level contrastive learning [J]. Journal of Computer Applications, 2024, 44(4): 1121-1127.
[9]	Dapeng XU, Xinmin HOU. Feature selection method for graph neural network based on network architecture design [J]. Journal of Computer Applications, 2024, 44(3): 663-670.
[10]	Nengbing HU, Biao CAI, Xu LI, Danhua CAO. Graph classification method based on graph pooling contrast learning [J]. Journal of Computer Applications, 2024, 44(11): 3327-3334.
[11]	Beijing ZHOU, Hairong WANG, Yimeng WANG, Lisi ZHANG, He MA. Recommendation method using knowledge graph embedding propagation [J]. Journal of Computer Applications, 2024, 44(10): 3252-3259.
[12]	Hongbin WANG, Xiao FANG, Hong JIANG. Commonsense reasoning and question answering method with three-dimensional semantic features [J]. Journal of Computer Applications, 2024, 44(1): 138-144.
[13]	Junhao LUO, Yan ZHU. Multi-dynamic aware network for unaligned multimodal language sequence sentiment analysis [J]. Journal of Computer Applications, 2024, 44(1): 79-85.
[14]	Guoshuai MA, Yuhua QIAN, Yayu ZHANG, Junxia LI, Guoqing LIU. Scientific collaboration potential prediction based on dynamic heterogeneous information fusion [J]. Journal of Computer Applications, 2023, 43(9): 2775-2783.
[15]	Runchao PAN, Qishan YU, Hongfei XIONG, Zhihui LIU. Collaborative recommendation algorithm based on deep graph neural network [J]. Journal of Computer Applications, 2023, 43(9): 2741-2746.