Network representation learning based on autoencoder with optimized graph structure

doi:10.11772/j.issn.1001-9081.2022101494

Abstract

Abstract:

The aim of Network Representation Learning （NRL） is to learn the potential and low-dimensional representation of network vertices， and the obtained representation is applied for downstream network analysis tasks. The existing NRL algorithms using autoencoder extract information about node attributes insufficiently and are easy to generate information bias， which affects the learning effect. Aiming at these problems， a Network Representation learning model based on Autoencoder with optimized Graph Structure （NR-AGS） was proposed to improve the accuracy by optimizing the graph structure. Firstly， the structure and attribute information were fused to generate the joint transition matrix， thereby forming the high-dimensional representation. Secondly， the low-dimensional embedded representation was learnt by an autoencoder. Finally， the deep embedded clustering algorithm was introduced during learning to form a self-supervision mechanism in the processes of autoencoder training and the category distribution division of nodes. At the same time， the improved Maximum Mean Discrepancy （MMD） algorithm was used to reduce the gap between distribution of the learnt low-dimensional embedded representation and distribution of the original data. Besides， in the proposed model， the reconstruction loss of the autoencoder， the deep embedded clustering loss and the improved MMD loss were used to optimize the network jointly. NR-AGS was applied to the learning of three real datasets， and the obtained low-dimensional representation was used for downstream tasks such as node classification and node clustering. Experimental results show that compared with the deep graph representation model DNGR （Deep Neural networks for Graph Representations）， NR-AGS improves the Micro-F1 score by 7.2， 13.5 and 8.2 percentage points at least and respectively on Cora， Citeseer and Wiki datasets. It can be seen that NR-AGS can improve the learning effect of NRL effectively.

Key words: Network Representation Learning (NRL), attribute information, autoencoder, deep embedded clustering, Maximum Mean Discrepancy (MMD)

摘要：

网络表示学习（NRL）旨在学习网络顶点的潜在、低维表示，再将得到的表示用于下游的网络分析任务。针对现有采用自编码器的NRL算法不能充分提取节点属性信息，学习时容易产生信息偏差从而影响学习效果的问题，提出一种基于优化图结构自编码器的网络表示学习模型（NR-AGS），通过优化图结构的方式提高准确率。首先，融合结构和属性信息来生成结构和属性联合转移矩阵，进而形成高维表示；其次，利用自编码器学习低维嵌入表示；最后，通过在学习过程中加入深度嵌入聚类算法，对自编码器的训练过程和节点的类别分布划分形成自监督机制，并且通过改进的最大均值差异（MMD）算法减小学习得到的低维嵌入潜在表示层分布和原始数据分布的差距。此外，NR-AGS使用自编码器的重构损失、深度嵌入聚类损失和改进的MMD损失共同优化网络。应用NR-AGS对3个真实数据集进行学习，再使用得到的低维表示完成下游的节点分类和节点聚类任务。实验结果表明，与深度图表示模型DNGR（Deep Neural networks for Graph Representations）相比，NR-AGS在Cora、Citeseer、Wiki数据集上的Micro-F1值分别至少提升了7.2、13.5和8.2个百分点。可见，NR-AGS可以有效提升NRL的学习效果。

关键词: 网络表示学习, 属性信息, 自编码器, 深度嵌入聚类, 最大均值差异

CLC Number:

TP391

Kun FU, Yuhan HAO, Minglei SUN, Yinghua LIU. Network representation learning based on autoencoder with optimized graph structure[J]. Journal of Computer Applications, 2023, 43(10): 3054-3061.

富坤, 郝玉涵, 孙明磊, 刘赢华. 基于优化图结构自编码器的网络表示学习[J]. 《计算机应用》唯一官方网站, 2023, 43(10): 3054-3061.

Figures/Tables 10

References 32

1	孙金清，周慧，赵中英. 网络表示学习方法研究综述［J］. 山东科技大学学报（自然科学版）， 2021， 40（1）：117-128. 10.16452/j.cnki.sdkjzk.2021.01.014
	SUN J Q， ZHOU H， ZHAO Z Y. A survey of network representation learning methods［J］. Journal of Shandong University of Science and Technology （Natural Science）， 2021， 40（1）： 117-128. 10.16452/j.cnki.sdkjzk.2021.01.014
2	ZHANG D， YIN J， ZHU X， et al. Network representation learning： a survey［J］. IEEE Transactions on Big Data， 2020， 6（1）： 3-28. 10.1109/tbdata.2018.2850013
3	WU Z， PAN S， CHEN F， et al. A comprehensive survey on graph neural networks［J］. IEEE Transactions on Neural Networks and Learning Systems， 2021， 32（1）： 4-24. 10.1109/tnnls.2020.2978386
4	HOU Y， CHEN H， LI C， et al. A representation learning framework for property graphs［C］// Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2019： 65-73. 10.1145/3292500.3330948
5	AHMED N K， ROSSI R A， LEE J B， et al. Role-based graph embeddings［J］. IEEE Transactions on Knowledge and Data Engineering， 2022， 34（5）： 2401-2415. 10.1109/tkde.2020.3006475
6	刘昱阳，李龙杰，单娜，等. 融合聚集系数的链接预测方法［J］. 计算机应用， 2020， 40（1）： 28-35. 10.11772/j.issn.1001-9081.2019061008
	LIU Y Y， LI L J， SHAN N， et al. Link prediction method fusing clustering coefficients［J］. Journal of Computer Applications， 2020， 40（1）： 28-35. 10.11772/j.issn.1001-9081.2019061008
7	PEROZZI B， AL-RFOU R， SKIENA S. DeepWalk： online learning of social representations［C］// Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2014： 701-710. 10.1145/2623330.2623732
8	GROVER A， LESKOVEC J. node2vec： scalable feature learning for networks［C］// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2016： 855-864. 10.1145/2939672.2939754
9	CAO S， LU W， XU Q. Deep neural networks for learning graph representations［C］// Proceedings of the 30th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2016： 1145-1152. 10.1609/aaai.v30i1.10179
10	FAN S， WANG X， SHI C， et al. One2Multi graph autoencoder for multi-view graph clustering［C］// Proceedings of the Web Conference 2020. Republic and Canton of Geneva： International World Wide Web Conferences Steering Committee， 2020： 3070-3076. 10.1145/3366423.3380079
11	YANG C， LIU Z， ZHAO D， et al. Network representation learning with rich text information［C］// Proceedings of the 24th International Joint Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2015： 2111-2117. 10.1609/aaai.v29i1.9448
12	TU W， ZHOU S， LIU X， et al. Deep fusion clustering network［C］// Proceedings of the 35th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2021： 9978-9987. 10.1609/aaai.v35i11.17198
13	BEHROUZI T， HATZINAKOS D. Graph variational auto-encoder for deriving EEG-based graph embedding［J］. Pattern Recognition， 2022， 121： No.108202. 10.1016/j.patcog.2021.108202
14	TANG J， QU M， WANG M， et al. LINE： large-scale information network embedding［C］// Proceedings of the 24th International Conference on World Wide Web. Republic and Canton of Geneva： International World Wide Web Conferences Steering Committee， 2015：1067-1077. 10.1145/2736277.2741093
15	CAO S， LU W， XU Q. GraRep： learning graph representations with global structural information［C］// Proceedings of the 24th ACM International Conference on Information and Knowledge Management. New York： ACM， 2015： 891-900. 10.1145/2806416.2806512
16	WANG D， CUI P， ZHU W. Structural deep network embedding［C］// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2016： 1225-1234. 10.1145/2939672.2939753
17	HUANG X， LI J， HU X. Accelerated attributed network embedding［C］// Proceedings of the 2017 SIAM International Conference on Data Mining. Philadelphia， PA： SIAM， 2017：633-641. 10.1137/1.9781611974973.71
18	ZHU D， CUI P， WANG D， et al. Deep variational network embedding in Wasserstein space［C］// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2018： 2827-2836. 10.1145/3219819.3220052
19	CEN K， SHEN H， GAO J， et al. ANAE： learning node context representation for attributed network embedding［EB/OL］. （2020-07-06）［2022-07-12］. .
20	GUO X， GAO L， LIU X， et al. Improved deep embedded clustering with local structure preservation［C］// Proceedings of the 26th International Joint Conference on Artificial Intelligence. California： ijcai.org， 2017： 1753-1759. 10.24963/ijcai.2017/243
21	YANG Y. SDCN： a species-disease hybrid convolutional neural network for plant disease recognition［C］// Proceedings of the 2022 International Conference on Artificial Neural Networks， LNCS 13532. Cham： Springer， 2022： 769-780.
22	LI Y， SWERSKY K， ZEMEL R. Generative moment matching networks［C］// Proceedings of the 32nd International Conference on Machine Learning. New York： JMLR.org， 2015： 1718-1727.
23	BOUSMALIS K， TRIGEORGIS G， SILBERMAN N， et al. Domain separation networks ［C］// Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2016： 343-351.
24	WANG W， SUN Y， HALGAMUGE S. Improving MMD-GAN training with repulsive loss function［EB/OL］（2019-02-08）［2022-07-12］.，pdf.
25	KINGMA D P， WELLING M. Auto-encoding variational Bayes［EB/OL］. （2014-05-01）［2022-07-12］.. 10.1561/2200000056
26	TZENG E， HOFFMAN J， ZHANG N， et al. Deep domain confusion： maximizing for domain invariance［EB/OL］. （2014-12-10）［2022-07-12］..
27	TANG Y， REN F， PEDRYCZ W. Fuzzy C-Means clustering through SSIM and patch for image segmentation［J］. Applied Soft Computing， 2020， 87： No.105928. 10.1016/j.asoc.2019.105928
28	尤坊州，白亮. 关键节点选择的快速图聚类算法［J］. 计算机科学与探索， 2021， 15（10）： 1930-1937. 10.3778/j.issn.1673-9418.2007004
	YOU F Z， BAI L. Fast graph clustering algorithm based on selection of key nodes［J］. Journal of Frontiers of Computer Science and Technology， 2021， 15（10）： 1930-1937. 10.3778/j.issn.1673-9418.2007004
29	HU S， ZHANG B， LV H， et al. Improving network representation learning via dynamic random walk， self-attention and vertex attributes-driven Laplacian space optimization［J］. Entropy， 2022， 24（9）： No.1213. 10.3390/e24091213
30	张蕾，钱峰，赵姝，等. 利用变分自编码器进行网络表示学习［J］. 计算机科学与探索， 2019， 13（10）：1733-1744.
	ZHANG L， QIAN F， ZHAO S， et al. Network representation learning via variational auto-encoder［J］. Journal of Frontiers of Computer Science and Technology， 2019， 13（10）：1733-1744.
31	WANG Y， SONG Z， ZHANG R， et al. An overview of t-SNE optimization algorithms［J］. International Core Journal of Engineering， 2021， 7（2）： 422-432.
32	LI X， KAO B， REN Z， et al. Spectral clustering in heterogeneous information networks［C］// Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2019： 4221-4228. 10.1609/aaai.v33i01.33014221

数据集	节点数	边数	属性数	类别数
Cora	2 708	5 429	1 443	7
Citeseer	3 312	4 732	3 703	6
Wiki	2 405	12 761	4 973	17

数据集	节点数	边数	属性数	类别数
Cora	2 708	5 429	1 443	7
Citeseer	3 312	4 732	3 703	6
Wiki	2 405	12 761	4 973	17

模型	训练集比例
模型	10%	30%	50%	70%	90%
SVD	0.505±0.30	0.625±0.19	0.714±0.23	0.749±0.16	0.756±0.17
DeepWalk	0.779±0.25	0.818±0.16	0.830±0.23	0.843±0.11	0.855±0.05
Node2Vec	0.764±0.12	0.791±0.17	0.797±0.13	0.816±0.23	0.820±0.19
DNGR	0.743±0.05	0.788±0.23	0.789±0.17	0.803±0.12	0.830±0.18
TADW	0.767±0.13	0.819±0.09	0.839±0.23	0.845±0.25	0.870±0.05
AANE	0.496±0.15	0.669±0.12	0.722±0.18	0.735±0.21	0.823±0.19
ANAE	0.748±0.19	0.784±0.14	0.803±0.12	0.825±0.11	0.838±0.13
DFCN	0.801±0.13	0.823±0.21	0.830±0.16	0.837±0.19	0.851±0.25
dSAFNE	0.745±0.11	0.763±0.18	0.788±0.11	0.801±0.13	0.829±0.15
NR-AGS	0.831±0.17	0.862±0.22	0.869±0.08	0.883±0.26	0.902±0.09

模型	训练集比例
模型	10%	30%	50%	70%	90%
SVD	0.505±0.30	0.625±0.19	0.714±0.23	0.749±0.16	0.756±0.17
DeepWalk	0.779±0.25	0.818±0.16	0.830±0.23	0.843±0.11	0.855±0.05
Node2Vec	0.764±0.12	0.791±0.17	0.797±0.13	0.816±0.23	0.820±0.19
DNGR	0.743±0.05	0.788±0.23	0.789±0.17	0.803±0.12	0.830±0.18
TADW	0.767±0.13	0.819±0.09	0.839±0.23	0.845±0.25	0.870±0.05
AANE	0.496±0.15	0.669±0.12	0.722±0.18	0.735±0.21	0.823±0.19
ANAE	0.748±0.19	0.784±0.14	0.803±0.12	0.825±0.11	0.838±0.13
DFCN	0.801±0.13	0.823±0.21	0.830±0.16	0.837±0.19	0.851±0.25
dSAFNE	0.745±0.11	0.763±0.18	0.788±0.11	0.801±0.13	0.829±0.15
NR-AGS	0.831±0.17	0.862±0.22	0.869±0.08	0.883±0.26	0.902±0.09

模型	训练集比例
模型	10%	30%	50%	70%	90%
SVD	0.649±0.17	0.709±0.19	0.724±0.30	0.728±0.20	0.731±0.18
DeepWalk	0.539±0.09	0.577±0.23	0.590±0.25	0.590±0.13	0.623±0.13
Node2Vec	0.530±0.19	0.580±0.22	0.588±0.11	0.590±0.19	0.618±0.05
DNGR	0.456±0.23	0.509±0.13	0.518±0.19	0.527±0.18	0.625±0.21
TADW	0.689±0.05	0.713±0.12	0.730±0.20	0.729±0.13	0.732±0.23
AANE	0.647±0.25	0.690±0.11	0.712±0.25	0.712±0.16	0.723±0.26
ANAE	0.652±0.11	0.644±0.23	0.713±0.13	0.735±0.17	0.741±0.13
DFCN	0.665±0.13	0.672±0.23	0.701±0.17	0.727±0.26	0.746±0.09
dSAFNE	0.672±0.10	0.688±0.12	0.691±0.21	0.720±0.16	0.732±0.14
NR-AGS	0.709±0.13	0.724±0.16	0.744±0.09	0.737±0.11	0.760±0.09