Journal of Computer Applications ›› 2019, Vol. 39 ›› Issue (3): 651-655.DOI: 10.11772/j.issn.1001-9081.2018071509

Previous Articles     Next Articles

Network representation learning algorithm based on improved random walk

WANG Wentao, HUANG Ye, WU Lintao, KE Xuan, TANG Wan   

  1. College of Computer Science, South-Central University for Nationalities, Wuhan Hubei 430074, China
  • Received:2018-07-23 Revised:2018-09-03 Online:2019-03-10 Published:2019-03-11
  • Contact: 黄烨
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61103248), the Fundamental Research Funds for the Central Universities of South-Central University for Nationalities (CZY18014), the Innovative Research Program for Graduates of South-Central University for Nationalities (2018sycxjj269).

基于改进随机游走的网络表示学习算法

王文涛, 黄烨, 吴淋涛, 柯璇, 唐菀   

  1. 中南民族大学 计算机科学学院, 武汉 430074
  • 作者简介:王文涛(1967-),男,河北邯郸人,副教授,博士,主要研究方向:计算机网络与控制;黄烨(1993-),男,湖北大悟人,硕士研究生,主要研究方向:知识表示、神经网络;吴淋涛(1994-),男,湖南茶陵人,硕士研究生,主要研究方向:数据挖掘;柯璇(1994-),女,湖北阳新人,硕士研究生,主要研究方向:人工智能;唐菀(1974-),女,贵州都匀人,教授,博士,主要研究方向:光/无线网络协议、网络安全。
  • 基金资助:
    国家自然科学基金资助项目(61103248);中南民族大学中央高校基本科研业务费专项(CZY18014);中南民族大学研究生创新基金资助项目(2018sycxjj269)。

Abstract: Existing Word2vec-based Network Representation Learning (NRL) algorithms use a Random Walk (RW) to generate node sequence. The RW tends to select nodes with larger degrees, so that the node sequence can not reflect the network structure information well, decreasing the performance of the algorithm. To solve the problem, a new network representation learning algorithm based on improved random walk was proposed. Firstly, RLP-MHRW (Remove self-Loop Probability for Metropolis-Hastings Random Walk) was used to generate node sequence. This algorithm would not favor nodes with larger degrees while forming a node sequence, so that the obtained sequence can efficiently reflect the network structure information. Then, the node sequence was put into Skip-gram model to obtain the node representation vector. Finally, the performance of the network representation learning algorithm was measured by a link prediction task. Contrast experiment has been performed in four real network datasets. Compared with LINE (Large-scale Information Network Embedding) and node2vec on arXiv ASTRO-PH, the AUC (Area Under Curve) value of link prediction has increased by 8.9% and 3.5% respectively, and so do the other datasets. Experimental results show that RLP-MHRW can effectively improve the performance of the network representation learning algorithm based on Word2vec.

Key words: Network Representation Learning (NRL), Random Walk (RW), link prediction, unbiased sampling, Machine Learning (ML)

摘要: 现有的基于Word2vec的网络表示学习(NRL)算法使用随机游走(RW)来生成节点序列,针对随机游走倾向于选择具有较大度的节点,生成的节点序列不能很好地反映网络结构信息,从而影响表示学习性能的问题,提出了基于改进随机游走的网络表示学习算法。首先,使用RLP-MHRW算法生成节点序列,它在生成节点序列时不会偏向大度节点,得到的节点序列能更好地反映网络结构信息;然后,将节点序列投入到Skip-gram模型得到节点表示向量;最后,利用链路预测任务来测度表示学习性能。在4个真实网络数据集上进行了实验。在论文合作网络arXiv ASTRO-PH上与LINE和node2vec算法相比,链路预测的AUC值分别提升了8.9%和3.5%,其他数据集上也均有提升。实验结果表明,RLP-MHRW能有效提高基于Word2vec的网络表示学习算法的性能。

关键词: 网络表示学习, 随机游走, 链路预测, 无偏采样, 机器学习

CLC Number: