Aiming at the problems of the corresponding features between different modals easy to be fused and mislocated, the subjective empirical parameter adjustment of recognition model experts, and the high computational cost, a self-optimized dual-modal (“contrast enhanced T1 weighting” and “high resolution enhanced T2 weighting”) multi-channel non-deep vestibular schwannoma recognition model was proposed. Firstly, a vestibular schwannoma recognition model was constructed to further explore the multi-modal image features of vestibular schwannoma and the complex nonlinear complementary information among the modals. Then, a model optimization strategy with global parallel sparrow search algorithm based on game theory was designed to realize the adaptive optimization of key hyperparameters of the model, so that the model had a better recognition effect. Experimental results show that compared with the deep learning-based model, the proposed model reduces the number of parameters by 27.9% with an improvement of 4.19 percentage points in recognition accuracy, which verifies the effectiveness and adaptability of the proposed model.