《计算机应用》唯一官方网站

• •    下一篇

基于数据增强的子图感知对比学习

李玟1,李开荣2,杨凯1   

  1. 1. 江苏省扬州市邗江区华扬西路196号扬州大学
    2. 扬州大学
  • 收稿日期:2025-02-07 修回日期:2025-04-01 发布日期:2025-04-27 出版日期:2025-04-27
  • 通讯作者: 李开荣
  • 基金资助:
    国家自然科学基金

Subgraph-aware contrastive learning with data augmentation

  • Received:2025-02-07 Revised:2025-04-01 Online:2025-04-27 Published:2025-04-27

摘要: 图神经网络(GNN)是处理图结构数据的有效图表示方法。然而在实际应用中,GNN的性能受限于信息缺失问题。一方面,图结构通常较为稀疏,导致模型难以充分学习节点特征;另一方面,监督学习依赖的标签数据通常稀缺,使得模型训练受限,难以获得鲁棒的节点表示。针对以上问题,提出了一种基于数据增强的子图感知对比学习模型SCLDA。首先,通过链路预测学习原始图得出节点之间的关系得分,并将得分最高的边添加到原始图中以生成增强图;其次,对原始图和增强图分别利用目标节点进行采样局部子图,将子图的目标节点输入共享GNN编码器,生成子图级别的目标节点嵌入;最后,基于两个视角子图的目标节点的对比学习来最大化相似实例之间的互信息。在六个公共数据集Cora、Citeseer、Pubmed、Cora_ML、DBLP和Photo上进行节点分类实验,SCLDA比传统GCN模型的准确率分别提升了约4.4%、6.3%、4.5%、7.0%、13.2%和9.3%。

关键词: 图表示学习, 图神经网络, 数据增强, 自监督学习, 图对比学习, 节点分类

Abstract: Graph Neural Network (GNN) is an effective graph representation for processing graph structure data. However, the performance of GNN in practical applications was limited by the problem of missing information. On the one hand, the graph structure was usually sparse, making it difficult for the model to adequately learn node features. On the other hand, model training was limited by the fact that labels data, on which supervised learning relies, were often scarce, making it difficult to obtain robust node representations. To address these problems, a Subgraph-aware Contrastive Learning with Data Augmentation (SCLDA) model is proposed. Firstly, the relationship scores between nodes are derived by learning the original graph through link prediction, and the edges with the highest scores are added to the original graph to generate the enhanced graph. Secondly, local subgraphs are sampled using target nodes for the original and enhanced graphs, respectively, and the target nodes of subgraphs are input to the shared GNN encoder to generate target node embeddings at the subgraph level. Finally, mutual information between similar instances are maximized based on contrastive learning of the target nodes from the two perspective subgraphs. Experiments on node classification on six public datasets Cora, Citeseer, Pubmed, Cora_ML, DBLP, and Photo show that SCLDA improves the accuracy over the traditional GCN model by about 4.4%, 6.3%, 4.5%, 7.0%, 13.2% and 9.3%, respectively.

Key words: graph representation learning, graph neural network, data augmentation, self-supervised learning, graph contrastive learning, node classification

中图分类号: