• •    

基于节点-属性二部图的网络表示学习模型

周乐1,代婷婷2,李淳3,谢军3,楚博策4,李峰4,张君毅3,刘峤5   

  1. 1. 电子科技大学信息与软件工程学院
    2. 四川省成都市成华区建设北路二段4号
    3. 河北省电磁频谱认知与管控重点实验室
    4. 中国电子科技集团公司航天信息应用技术重点实验室
    5. 电子科技大学
  • 收稿日期:2021-06-07 修回日期:2021-08-17 发布日期:2021-08-17
  • 通讯作者: 周乐

Network Embedding based on Node Attribute Bipartite Graph

  • Received:2021-06-07 Revised:2021-08-17 Online:2021-08-17
  • Contact: Le ZHOU

摘要: 摘 要: 在图结构数据上开展推理计算是一项重大且普遍存在的任务,该任务的主要挑战是如何表示网状知识使机器可以快速理解并利用图数据。通过对比发现,当前基于随机游走方法的表示学习模型容易忽略属性对节点关联的特殊作用。据此提出一种基于节点邻接关系与属性关联关系的混合随机游走方法,其基本思想是首先通过邻接节点间的共同属性分布计算属性权重,获取节点到每个属性的采样概率,然后分别从邻接节点与含有共有属性的非邻接节点中提取网络信息。最后构建了基于节点-属性二部图的网络表示学习模型,通过上述采样序列学习得到节点向量表达。在Flickr、BlogCatalog、Cora公开数据集上,用该模型得到的节点向量表达进行节点标签分类的平均准确率为89.07%,比近期工作高出了2.13个百分点,比经典工作高出了21.34个百分点,且通过对比不同随机游走方法发现,提高对节点关联有促进作用的属性的采样概率,可以提高采样序列所含信息。

关键词: 关键词: 网络嵌入, 表示学习, 随机游走, 网络采样, 属性网络, 节点分类

Abstract: Abstract: It is an important task to carry out inference calculation on graph structure data. The main challenge of this task is how to represent network knowledge so that machines can easily understand and use graph structure data. After comparing with the existing representation learning models, it is found that the model based on random walk methods are likely to ignore the special effect of attributes on the adjacency relationship between nodes. There for, a hybrid random walk method based on node adjacency and attributes association was proposed. The basic idea is to calculate the attribute weight through the common attribute distribution among adjacent nodes, and obtain the sampling probability from node to each attribute. Then the network information can be extracted from adjacent nodes and non-adjacent nodes with common attributes. Finally, the network representation learning model based on node attribute bipartite graph was constructed to learn nodes’ embeddings through the above sampling sequence. Experimental results on Flickr, BlogCatalog and Cora show that the average accuracy of node classification by the model is 89.07%,which was 2.13 percentage points higher than recent work and 21.34 percentage points higher than classical work. By comparing different random walk methods, it is also found that increasing the sampling probability of attributes that promote node association can improve the information contained in the sampling sequence.

Key words: Keywords: network embedding, representation learning, random walk, network sampling, attributed network, node classification

中图分类号: