Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (7): 2188-2194.DOI: 10.11772/j.issn.1001-9081.2024070910

• Artificial intelligence • Previous Articles     Next Articles

Multi-scale sparse graph guided vision graph neural networks

Zimo ZHANG1, Xuezhuan ZHAO2,3,4,5()   

  1. 1.Faculty of Arts and Social Sciences,National University of Singapore,Singapore 119077,Singapore
    2.National Key Laboratory of Air-based Information Perception and Fusion,Luoyang Henan 471009,China
    3.School of Computer Science,Zhengzhou University of Aeronautics,Zhengzhou Henan 450046,China
    4.Chongqing Research Institute,Harbin Institute of Technology,Chongqing 401151,China
    5.Collaborative Innovation Center of Aerospace Electronic Information Technology of Henan Province (Zhengzhou University of Aeronautics),Zhengzhou Henan 450046,China
  • Received:2024-07-01 Revised:2024-11-06 Accepted:2024-11-11 Online:2025-07-10 Published:2025-07-10
  • Contact: Xuezhuan ZHAO
  • About author:ZHANG Zimo, born in 2001, M. S. candidate. Her research interests include machine learning, big data analysis.
    ZHAO Xuezhuan, born in 1986, Ph. D., associate professor. His research interests include machine learning, pattern recognition.
  • Supported by:
    Henan Province key Research and Development Project(231111212000);Project of Henan Center for Outstanding Overseas Scientists(GZS2022011);Aviation Science Foundation(20230001055002);Chongqing Natural Science Foundation(CSTB2023NSCQ-MSX0070);Henan Science and Technology Research Program(232102210054)

多尺度稀疏图引导的视觉图神经网络

张子墨1, 赵雪专2,3,4,5()   

  1. 1.新加坡国立大学 人文与社会科学学院,新加坡 119077
    2.空基信息感知与融合全国重点实验室,河南 洛阳 471009
    3.郑州航空工业管理学院 计算机学院,郑州 450046
    4.哈尔滨工业大学 重庆研究院,重庆 401151
    5.航空航天电子信息技术河南省协同创新中心(郑州航空工业管理学院),郑州 450046
  • 通讯作者: 赵雪专
  • 作者简介:张子墨(2001—),女,河南郑州人,硕士研究生,主要研究方向:机器学习、大数据分析
    赵雪专(1986—),男,河南濮阳人,副教授,博士,CCF会员,主要研究方向:机器学习、模式识别。zhaoxuezhuan@zua.edu.cn
  • 基金资助:
    河南省重点研发专项(231111212000);河南省杰出外籍科学家工作室项目(GZS2022011);航空科学基金资助项目(20230001055002);重庆市自然科学基金资助项目(CSTB2023NSCQ-MSX0070);河南省科技攻关项目(232102210054)

Abstract:

Recently, the Vision Graph neural network (ViG) has attracted considerable attention from the researchers in the field of computer vision, with graph construction being a key modeling approach in ViG. The existing popular K-Nearest Neighbor (KNN) graph construction approach is limited by fixed scale and quadratic computational complexity, making it difficult to model both local and multi-scale information in the image. To address this issue, a construction method of multi-scale sparse graph — MSSG (Multi-Scale Sparse Graph) was proposed. In this method, the KNN graph was decomposed into three sparse subgraphs of different scales along the channel dimension, achieving linear computational complexity while modeling both local and multi-scale information in the image effectively. To enhance the model’s global modeling capability, a global and local multi-scale information fusion strategy was proposed. Based on these methods, a vision architecture — MSViG (Multi-Scale Vision Graph neural network) was proposed. The results of image classification experiments on ImageNet-1K dataset demonstrate that MSViG outperforms the existing ViG. For example, the proposed MSViG-T achieves a 2.1 percentage points higher Top-1 classification accuracy compared to ViG-T, and it also shows significant performance improvements in downstream vision tasks such as object detection and instance segmentation compared to ViG.

Key words: Graph Neural Network (GNN), Vision Graph neural network (ViG), vision backbone, image classification, object detection, instance segmentation

摘要:

近年来,视觉图神经网络(ViG)在计算机视觉领域引起了研究人员的广泛关注,其中构图是ViG的重要建模方式。目前流行的K近邻(KNN)构图方法尺度单一、具有二次计算复杂度并且难以建模图像的局部和多尺度信息。为了解决该问题,提出一种尺度稀疏图的构筑方法——MSSG(Multi-Scale Sparse Graph)。该方法将KNN图沿通道分解为3个不同尺度的稀疏子图,具有线性的计算复杂度并且能实现图像局部信息和多尺度信息的有效建模。为了增强模型的全局建模能力,提出一种全局和局部多尺度信息融合策略。基于以上方法,提出一种视觉架构——MSViG(Multi-Scale Vision Graph neural network)。在ImageNet-1K数据集上进行的图像分类实验的结果表明,MSViG优于传统的ViG。例如,与视觉神经网络ViG-T相比,所提MSViG-T的Top-1分类准确率提高了2.1个百分点,并且在目标检测和实例分割视觉下游任务上MSViG相较于传统ViG取得了较大的性能提升。

关键词: 图神经网络, 视觉图神经网络, 视觉骨干网络, 图像分类, 目标检测, 实例分割

CLC Number: