Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (2): 392-402.DOI: 10.11772/j.issn.1001-9081.2024030266

• Artificial intelligence • Previous Articles    

Graph data augmentation method for few-shot node classification

Kun FU1(), Shicong YING1, Tingting ZHENG2,3, Jiajie QU2,3, Jingyuan CUI1, Jianwei LI1   

  1. 1.School of Artificial Intelligence,Hebei University of Technology,Tianjin 300401,China
    2.Tianjin Institute of Aerospace Mechanical and Electrical Equipment,Tianjin 300462,China
    3.Tianjin Key Laboratory of Aerospace Intelligent Equipment Technology (Tianjin Institute of Aerospace Mechanical and Electrical Equipment),Tianjin 300462,China
  • Received:2024-03-13 Revised:2024-06-04 Accepted:2024-06-11 Online:2024-08-02 Published:2025-02-10
  • Contact: Kun FU
  • About author:YING Shicong, born in 1997, M. S. candidate. His research interests include network representation learning.
    ZHENG Tingting, born in 1988, M. S., engineer. Her research interests include multi-source heterogeneous data analysis.
    QU Jiajie, born in 1992, M. S., engineer. Her research interests include multi-source heterogeneous data analysis.
    CUI Jingyuan, born in 1998, M. S. candidate. His research interests include network representation learning.
    LI Jianwei, born in 1974, Ph. D., professor. His research interests include bioinformatics, graph convolutional neural network.
  • Supported by:
    National Natural Science Foundation of China(62072154);Tianjin Science and Technology Plan Project(22JCYBJC01740);Hebei Province Major Scientific and Technological Achievement Transformation Fund Support Project(22280803Z)

面向小样本节点分类的图数据增强方法

富坤1(), 应世聪1, 郑婷婷2,3, 屈佳捷2,3, 崔静远1, 李建伟1   

  1. 1.河北工业大学 人工智能与数据科学学院,天津 300401
    2.天津航天机电设备研究所,天津 300462
    3.天津市宇航智能装备技术企业重点实验室(天津航天机电设备研究所),天津 300462
  • 通讯作者: 富坤
  • 作者简介:应世聪(1997—),男,河南漯河人,硕士研究生,主要研究方向:网络表示学习
    郑婷婷(1988—),女,天津人,工程师,硕士,主要研究方向:多源异构数据分析
    屈佳捷(1992—),女,天津人,工程师,硕士,主要研究方向:多源异构数据分析
    崔静远(1998—),男,河北邯郸人,硕士研究生,主要研究方向:网络表示学习
    李建伟(1974—),男,河北唐山人,教授,博士,主要研究方向:生物信息学、图卷积神经网络。
  • 基金资助:
    国家自然科学基金资助项目(62072154);天津市科技计划项目(22JCYBJC01740);河北省重大科技成果转化专项(22280803Z)

Abstract:

Graph structure data are widely found in the real world. However, they often face a shortage of labeled data in practical applications. Methods for Few-Shot Learning (FSL) on graph data aim to classify data with a few labeled samples. Although these methods have good performance in Few-Shot Node Classification (FSNC) tasks, there are still the following problems: high-quality labeled data are difficult to obtain, generalization ability is insufficient in the parameter initialization process, the topology structure information in graph is not fully mined. To address these problems, a Few-Shot Node Classification model based Graph Data Augmentation (GDA-FSNC) was proposed. There are four modules in GDA-FSNC: a graph data pre-processing module based on structural similarity, a parameter initialization module, a parameter fine-tuning module, and an adaptive pseudo-label generation module. In the graph data pre-processing module, an adjacency matrix enhancement method based on structural similarity was used to obtain more graph structural information. In the parameter initialization module, to enhance the diversity of information during the model training process, a mutual teaching-based data augmentation method was used to make each model learn different patterns and features from the other models. In the adaptive pseudo-label generation module, appropriate pseudo-label generation techniques were selected automatically according to the characteristics of different datasets, thereby generating high-quality pseudo-label data. Experimental results on seven real datasets show that the proposed model performs better than the state-of-the-art FSL models such as Meta-GNN, GPN(Graph Prototypical Network), and IA-FSNC (Information Augmentation for Few-Shot Node Classification) in classification accuracy. For example, compared to the baseline model IA-FSNC, The classification accuracy of the proposed model has been improved by at least 0.27 percentage points in the 2-way 1-shot setting of the small dataset and by at least 2.06 percentage points in the 5-way 1-shot setting of the large datasets. It can be seen that GDA-FSNC has better classification performance and generalization ability in few-shot scenarios.

Key words: node classification, Graph Convolutional Network (GCN), data augmentation, meta-learning, Few-Shot Learning (FSL)

摘要:

现实中,图结构数据广泛存在,然而,在实际应用中,这些数据常面临标注数据短缺的难题。图数据的小样本学习(FSL)方法旨在以较少的标注样本实现数据的分类。尽管这些方法在小样本节点分类(FSNC)任务上获得较好的性能,但还存在以下问题:高质量的标签数据难获取,参数初始化过程泛化能力不足,未能充分挖掘图中的拓扑结构信息。为解决这些问题,提出一种基于图数据增强的小样本节点分类模型(GDA-FSNC)。GDA-FSNC由4个模块构成:基于结构相似度的图数据预处理模块、参数初始化模块、参数微调模块和自适应伪标签生成模块。在图数据预处理模块中,通过基于结构相似度的邻接矩阵增强方法获取更多的图结构信息;在参数初始化模块中,使用互相教学的数据增强方法使每个模型都能从其他模型学到不同的模式和特征,增强信息的多样性;在自适应伪标签生成模块中,根据不同数据集的特征自动选择合适的伪标签生成技术,以生成高质量的伪标签数据。在7个真实数据集上的实验结果表明,GDA-FSNC的分类准确率超过了Meta-GNN、GPN(Graph Prototypical Network)、IA-FSNC(Information Augmentation for Few-Shot Node Classification)等主流的FSL模型。例如,相较于基线模型IA-FSNC,所提模型的分类准确率在小数据集2-way 1-shot设置下至少提升了0.27个百分点,在大数据集5-way 1-shot设置下至少提升了2.06个百分点。可见,GDA-FSNC在小样本场景下有更好的分类性能和泛化能力。

关键词: 节点分类, 图卷积网络, 数据增强, 元学习, 小样本学习

CLC Number: