Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (3): 663-670.DOI: 10.11772/j.issn.1001-9081.2023030353

• Artificial intelligence •     Next Articles

Feature selection method for graph neural network based on network architecture design

Dapeng XU1(), Xinmin HOU1,2   

  1. 1.School of Mathematical Sciences,University of Science and Technology of China,Hefei Anhui 230026,China
    2.CAS Key Laboratory of Wu Wen-Tsun Mathematics (University of Science and Technology of China),Hefei Anhui 230026,China
  • Received:2023-04-03 Revised:2023-05-08 Accepted:2023-05-09 Online:2023-05-30 Published:2024-03-10
  • Contact: Dapeng XU
  • About author:HOU Xinmin, born in 1972, Ph. D., professor. His research interests include graph theory and its applications, complex network, graph neural network.
  • Supported by:
    National Natural Science Foundation of China(12071453);National Key Research and Development Program(2020YFA0713100)

基于网络结构设计的图神经网络特征选择方法

徐大鹏1(), 侯新民1,2   

  1. 1.中国科学技术大学 数学科学学院,合肥 230026
    2.中国科学院吴文俊数学重点实验室(中国科学技术大学),合肥 230026
  • 通讯作者: 徐大鹏
  • 作者简介:侯新民(1972—),男,山东郓城人,教授,博士,主要研究方向:图论及其应用、复杂网络、图神经网络。
  • 基金资助:
    国家自然科学基金资助项目(12071453);国家重点研发计划项目(2020YFA0713100)

Abstract:

In recent years, researchers have proposed many improved model architecture designs for Graph Neural Network (GNN), driving performance improvements in various prediction tasks. But most GNN variants start with the assumption that node features are equally important, which is not the case. To solve this problem, a feature selection method was proposed to improve the existing model and select important feature subsets for the dataset. The proposed method consists of two components, a feature selection layer, and a separate label-feature mapping. Softmax normalizer and feature “soft selector” were used for feature selection in the feature selection layer, and the model structure was designed under the idea of separate label-feature mapping to select the corresponding subsets of related features for different labels, and multiple related feature subsets were performed union operation to obtain an important feature subset of the final dataset. Graph ATtention network (GAT) and GATv2 models were selected as the benchmark models, and the algorithm was applied to the benchmark models to obtain new models. Experimental results show that when the proposed models perform node classification tasks on six datasets, their accuracies are improved by 0.83% - 8.79% compared with the baseline models. The new models also select the corresponding important feature subsets for the six datasets, in which the number of features accounts for 3.94% - 12.86% of the total number of features in their respective datasets. After using the important feature subset as the new input of the benchmark model, the accuracy more than 95% (using all features) is still achieved. That is, the scale of the model is reduced while ensuring the accuracy. It can be seen that the proposed new algorithm can improve the accuracy of node classification, and can effectively select the corresponding important feature subset for the dataset.

Key words: Graph Neural Network (GNN), Graph ATtention network (GAT), feature selection, node classification, deep learning

摘要:

近年来,研究人员针对图神经网络(GNN)提出了许多改进的模型架构设计,推动了各种预测任务的性能提升。但大多数GNN变体在开始都认为节点的特征同等重要,而实际情况并非如此。针对这个问题,提出一种特征选择方法来改进现有模型,并为数据集选择出重要特征子集。所提方法由特征选择层和标签-特征单独映射两个组件构成。在特征选择层中使用Softmax归一化器和特征“软选择器”进行特征选择,在标签-特征单独映射思想下设计模型结构,为不同的标签选择对应的相关特征子集,并将多个相关特征子集作集合并运算得到最终数据集的重要特征子集。选取图注意力网络(GAT)和GATv2模型为基准模型,将算法应用到基准模型中得到新模型。实验结果表明,所提模型在6个数据集上执行节点分类任务时,准确率相较于基准模型提升了0.83%~8.79%;新模型也为6个数据集选择了对应的重要特征子集,这些重要特征子集的特征数量占各自数据集总特征数的3.94%~12.86%,将重要特征子集作为基准模型的新输入后仍然获得了95%以上的准确率(使用了所有特征),即在保证准确率的基础上减小了模型的规模。可见,所提方法能够提高节点分类准确率,并有效地为数据集选择对应的重要特征子集。

关键词: 图神经网络, 图注意力网络, 特征选择, 节点分类, 深度学习

CLC Number: