Recently, the Vision Graph neural network (ViG) has attracted considerable attention from the researchers in the field of computer vision, with graph construction being a key modeling approach in ViG. The existing popular K-Nearest Neighbor (KNN) graph construction approach is limited by fixed scale and quadratic computational complexity, making it difficult to model both local and multi-scale information in the image. To address this issue, a construction method of multi-scale sparse graph — MSSG (Multi-Scale Sparse Graph) was proposed. In this method, the KNN graph was decomposed into three sparse subgraphs of different scales along the channel dimension, achieving linear computational complexity while modeling both local and multi-scale information in the image effectively. To enhance the model’s global modeling capability, a global and local multi-scale information fusion strategy was proposed. Based on these methods, a vision architecture — MSViG (Multi-Scale Vision Graph neural network) was proposed. The results of image classification experiments on ImageNet-1K dataset demonstrate that MSViG outperforms the existing ViG. For example, the proposed MSViG-T achieves a 2.1 percentage points higher Top-1 classification accuracy compared to ViG-T, and it also shows significant performance improvements in downstream vision tasks such as object detection and instance segmentation compared to ViG.