Journal of Computer Applications ›› 2026, Vol. 46 ›› Issue (1): 1-9.DOI: 10.11772/j.issn.1001-9081.2025010110

• Artificial intelligence •     Next Articles

Subgraph-aware contrastive learning with data augmentation

Wen LI, Kairong LI(), Kai YANG   

  1. College of Information Engineering,Yangzhou University,Yangzhou Jiangsu 225127,China
  • Received:2025-02-07 Revised:2025-04-01 Accepted:2025-04-02 Online:2026-01-10 Published:2026-01-10
  • Contact: Kairong LI
  • About author:LI Wen, born in 1999, M. S. candidate. Her research interests include graph neural network, machine learning.
    YANG Kai, born in 1987, Ph. D., lecturer. His research interests include graph neural network, network science.
  • Supported by:
    National Natural Science Foundation of China(61872313);Jiangsu J-TOP Innovation Challenge Season Project(T-2023-2023-0410);Natural Science Research Project of Jiangsu Higher Education Institutions(22KJD120002)

基于数据增强的子图感知对比学习

李玟, 李开荣(), 杨凯   

  1. 扬州大学 信息工程学院,江苏 扬州 225127
  • 通讯作者: 李开荣
  • 作者简介:李玟(1999—),女,福建福鼎人,硕士研究生,主要研究方向:图神经网络、机器学习
    杨凯(1987—),男,山东泰安人,讲师,博士,主要研究方向:图神经网络、网络科学。
  • 基金资助:
    国家自然科学基金资助项目(61872313);江苏省J-TOP创新挑战季项目(T-2023-2023-0410);江苏省高校自然科学研究项目(22KJD120002)

Abstract:

Graph Neural Network (GNN) is an effective graph representation learning method for processing graph structure data. However, the performance of GNN in practical applications is limited by the problem of missing information. On the one hand, the graph structure is usually sparse, making it difficult for the model to learn node features adequately. On the other hand, model training is limited because supervised learning relies on sparse label data, making it difficult to obtain robust node representations. To address these problems, a Subgraph-aware Contrastive Learning with Data Augmentation (SCLDA) model was proposed. Firstly, the relationship scores among nodes were obtained by learning the original graph through link prediction, and the edges with the highest scores were added to the original graph to generate the enhanced graph. Secondly, local subgraphs of the original and enhanced graphs were sampled by using target nodes respectively, and the target nodes of subgraphs were input to the shared GNN encoder, so as to generate the target node embeddings at subgraph level. Finally, the mutual information between similar instances was maximized on the basis of contrastive learning of the target nodes from the two perspective subgraphs. Experimental results of node classification on six public datasets Cora, Citeseer, Pubmed, Cora_ML, DBLP, and Photo show that SCLDA model improves the accuracy over the traditional GCN model by about 4.4%, 6.3%, 4.5%, 7.0%, 13.2% and 9.3%, respectively.

Key words: graph representation learning, Graph Neural Network (GNN), data augmentation, self-supervised learning, Graph Contrastive Learning (GCL), node classification

摘要:

图神经网络(GNN)是处理图结构数据的有效图表示学习方法。然而,在实际应用中, GNN的性能受限于信息缺失问题:一方面,图结构通常较为稀疏,导致模型难以充分学习节点特征;另一方面,监督学习依赖的标签数据通常稀缺,使模型训练受限,进而难以获得鲁棒的节点表示。针对以上问题,提出一种基于数据增强的子图感知对比学习(SCLDA)模型。首先,使用链路预测学习原始图得出节点之间的关系得分,并将得分最高的边添加到原始图中以生成增强图;其次,对原始图和增强图分别利用目标节点进行局部子图采样,将子图的目标节点输入共享GNN编码器,生成子图级别的目标节点嵌入;最后,基于2个视角子图的目标节点的对比学习最大化相似实例之间的互信息。在Cora、Citeseer、Pubmed、Cora_ML、DBLP和Photo 6个公共数据集上进行节点分类实验的结果表明, SCLDA模型比传统GCN模型的准确率分别提升了约4.4%、6.3%、4.5%、7.0%、13.2%和9.3%。

关键词: 图表示学习, 图神经网络, 数据增强, 自监督学习, 图对比学习, 节点分类

CLC Number: