When multiple feature modalities are fused, there is a superposition of noise, and the cascaded structure used to reduce the differences between modalities does not fully utilize the feature information between modalities. To address these issues, a cross-modal Dual-stream Alternating Interactive Network (DAINet) method was proposed. Firstly, a Dual-stream Alternating Enhancement (DAE) module was constructed to fuse modal features in interactive dual-branch way. And by learning mapping relationships between modalities and employing bidirectional feedback adjustments of InFrared-VISible-InFrared (IR-VIS-IR) and VISible-InfRared-VISible (VIS-IR-VIS), the cross suppression of inter-modal noise was realized. Secondly, a Cross-Modal Feature Interaction (CMFI) module was constructed, and the residual structure was introduced to integrate low-level and high-level features within and between infrared-visible modalities, thereby minimizing differences and maximizing inter-modal feature utilization. Finally, on a self-constructed infrared-visible multi-modal typhoon dataset and a publicly available RGB-NIR multi-modal dataset, the effectiveness of DAE module and CMFI module was verified. Experimental results demonstrate that compared to the simple cascading fusion method on the self-constructed typhoon dataset, the proposed DAINet-based feature fusion method improves the overall classification accuracy by 6.61 and 3.93 percentage points for the infrared and visible modalities, respectively, with G-mean values increased by 6.24 and 2.48 percentage points, respectively. These results highlight the generalizability of the proposed method for class-imbalanced classification tasks. On the RGB-NIR dataset, the proposed method achieves the overall classification accuracy improvements of 13.47 and 13.90 percentage points, respectively, for the two test modalities. At the same time, experimental results of comparing with IFCNN (general Image Fusion framework based on Convolutional Neural Network) and DenseFuse methods demonstrate that the proposed method improves the overall classification accuracy by 9.82, 6.02, and 17.38, 1.68 percentage points for the two test modalities on the self-constructed typhoon dataset.