用于红外-可见光图像分类的跨模态双流交替交互网络

doi:10.11772/j.issn.1001-9081.2024010026

《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (1): 275-283.DOI: 10.11772/j.issn.1001-9081.2024010026

• 多媒体计算与计算机仿真 • 上一篇下一篇

用于红外-可见光图像分类的跨模态双流交替交互网络

郑宗生¹, 杜嘉¹(), 成雨荷¹, 赵泽骋¹, 张月维², 王绪龙³

^1.上海海洋大学信息学院，上海 201306
^2.广州气象卫星地面站，广州 510650
^3.山东省国土空间数据和遥感技术研究院（山东省海域动态监视监测中心），济南 250014

收稿日期:2024-01-15 修回日期:2024-03-26 接受日期:2024-04-01 发布日期:2024-05-09 出版日期:2025-01-10
通讯作者: 杜嘉
作者简介:郑宗生（1979—），男，河北唐山人，副教授，博士，主要研究方向：深度学习、遥感图像处理；
成雨荷（2000—），女，新疆博尔塔拉人，硕士研究生，主要研究方向：深度学习、遥感图像处理；
赵泽骋（1999—），男，山东泰安人，硕士研究生，主要研究方向：深度学习、海浪预测；
张月维（1977—），男，广东博罗人，工程师，主要研究方向：地理信息科学、计算机科学；
王绪龙（1972—），男，山东东平人，高级工程师，硕士，主要研究方向：遥感、地理信息。
基金资助:
国家自然科学基金资助项目(41671431);上海市科委地方院校能力建设项目(19050502100);广州气象卫星地面站项目(D-8006-23-0157)

Cross-modal dual-stream alternating interactive network for infrared-visible image classification

Zongsheng ZHENG¹, Jia DU¹(), Yuhe CHENG¹, Zecheng ZHAO¹, Yuewei ZHANG², Xulong WANG³

^1.College of Information Technology，Shanghai Ocean University，Shanghai 201306，China
^2.Guangzhou Meteorological Satellite Ground Station，Guangzhou Guangdong 510650，China
^3.Shandong Provincial Institute of Land Space Data and Remote Sensing Technology （Shandong Provincial Marine Dynamic Monitoring Center），Jinan Shandong 250014，China

Received:2024-01-15 Revised:2024-03-26 Accepted:2024-04-01 Online:2024-05-09 Published:2025-01-10
Contact: Jia DU
About author:ZHENG Zongsheng， born in 1979， Ph. D.， associate professor. His research interests include deep learning， remote sensing image processing.
CHENG Yuhe， born in 2000， M. S. candidate. Her research interests include deep learning， remote sensing image processing.
ZHAO Zecheng， born in 1999， M. S. candidate. His research interests include deep learning， wave prediction.
ZHANG Yuewei， born in 1977， engineer. His research interests include geographic information science， computer science.
WANG Xulong， born in 1972， M. S.， senior engineer. His research interests include remote sensing， geographic information.
Supported by:
National Natural Science Foundation of China(41671431);Shanghai Municipal Science and Technology Commission Local Colleges Capacity Building Project(19050502100);Guangzhou Meteorological Satellite Ground Station Project(D-8006-23-0157)

摘要/Abstract

摘要：

多特征模态融合时存在噪声的叠加，而为减小模态间的差异采用的级联方式的结构也未充分利用模态间的特征信息，因此设计一种跨模态双流交替交互网络（DAINet）方法。首先，构建双流交替增强（DAE）模块，以交互双分支形式融合模态特征，并通过学习模态数据的映射关系，以红外-可见光-红外（IR-VIS-IR）和可见光-红外-可见光（VIS-IR-VIS）的双向反馈调节实现模态间噪声的交叉抑制；然后，构建跨模态特征交互（CMFI）模块，并引入残差结构将红外-可见光模态内以及模态间的低层特征和高层特征进行有效融合，从而减小模态间的差异并充分利用模态间的特征信息；最后，在自建红外-可见光多模态台风数据集及RGB-NIR多模态公开场景数据集上进行实验，以验证DAE模块和CMFI模块的有效性。实验结果表明，与简单级联融合方法相比，所提的基于DAINet的特征融合方法在自建台风数据集上的红外模态和可见光模态上的总体分类精度分别提高了6.61和3.93个百分点，G-mean值分别提高了6.24和2.48个百分点，表明所提方法在类别不均衡分类任务上的通用性；所提方法在RGB-NIR数据集上的2种测试模态下的总体分类精度分别提高了13.47和13.90个百分点。同时，所提方法在2个数据集上分别与IFCNN（general Image Fusion framework based on Convolutional Neural Network）和DenseFuse方法进行对比的实验结果表明，所提方法在自建台风数据集上的2种测试模态下的总体分类精度分别提高了9.82、6.02和17.38、1.68个百分点。

关键词: 跨模态, 深度学习, 图像分类, 特征学习, 双流网络

Abstract:

When multiple feature modalities are fused， there is a superposition of noise， and the cascaded structure used to reduce the differences between modalities does not fully utilize the feature information between modalities. To address these issues， a cross-modal Dual-stream Alternating Interactive Network （DAINet） method was proposed. Firstly， a Dual-stream Alternating Enhancement （DAE） module was constructed to fuse modal features in interactive dual-branch way. And by learning mapping relationships between modalities and employing bidirectional feedback adjustments of InFrared-VISible-InFrared （IR-VIS-IR） and VISible-InfRared-VISible （VIS-IR-VIS）， the cross suppression of inter-modal noise was realized. Secondly， a Cross-Modal Feature Interaction （CMFI） module was constructed， and the residual structure was introduced to integrate low-level and high-level features within and between infrared-visible modalities， thereby minimizing differences and maximizing inter-modal feature utilization. Finally， on a self-constructed infrared-visible multi-modal typhoon dataset and a publicly available RGB-NIR multi-modal dataset， the effectiveness of DAE module and CMFI module was verified. Experimental results demonstrate that compared to the simple cascading fusion method on the self-constructed typhoon dataset， the proposed DAINet-based feature fusion method improves the overall classification accuracy by 6.61 and 3.93 percentage points for the infrared and visible modalities， respectively， with G-mean values increased by 6.24 and 2.48 percentage points， respectively. These results highlight the generalizability of the proposed method for class-imbalanced classification tasks. On the RGB-NIR dataset， the proposed method achieves the overall classification accuracy improvements of 13.47 and 13.90 percentage points， respectively， for the two test modalities. At the same time， experimental results of comparing with IFCNN （general Image Fusion framework based on Convolutional Neural Network） and DenseFuse methods demonstrate that the proposed method improves the overall classification accuracy by 9.82， 6.02， and 17.38， 1.68 percentage points for the two test modalities on the self-constructed typhoon dataset.

Key words: cross-modal, deep learning, image classification, feature learning, dual-stream network

中图分类号:

TP18

郑宗生, 杜嘉, 成雨荷, 赵泽骋, 张月维, 王绪龙. 用于红外-可见光图像分类的跨模态双流交替交互网络[J]. 计算机应用, 2025, 45(1): 275-283.

Zongsheng ZHENG, Jia DU, Yuhe CHENG, Zecheng ZHAO, Yuewei ZHANG, Xulong WANG. Cross-modal dual-stream alternating interactive network for infrared-visible image classification[J]. Journal of Computer Applications, 2025, 45(1): 275-283.

图/表 18

参考文献 38

1	南轲，齐华，叶沅鑫.深度卷积特征表达的多模态遥感影像模板匹配方法［J］.测绘学报， 2019， 48（6）： 727-736.
	NAN K， QI H， YE Y X. A template matching method of multimodal remote sensing images based on deep convolutional feature representation ［J］. Acta Geodaetica et Cartographica Sinica， 2019， 48（6）： 727-736.
2	SUN X， TIAN Y， LU W， et al. From single-to multi-modal remote sensing imagery interpretation： a survey and taxonomy ［J］. SCIENCE CHINA Information Sciences， 2023， 66（4）： No.140301.
3	王佩瑾，闫志远，容雪娥，等.数据受限条件下的多模态处理技术综述［J］.中国图象图形学报， 2022， 27（10）： 2803-2834.
	WANG P J， YAN Z Y， RONG X E， et al. Review of multimodal data processing techniques with limited data ［J］. Journal of Image and Graphics， 2022， 27（10）： 2803-2834.
4	HANG R， LI Z， GHAMISI P， et al. Classification of hyperspectral and LiDAR data using coupled CNNs ［J］. IEEE Transactions on Geoscience and Remote Sensing， 2020， 58（7）： 4939-4950.
5	MORCHHALE S， PAUCA V P， PLEMMONS R J， et al. Classification of pixel-level fused hyperspectral and LiDAR data using deep convolutional neural networks ［C］// Proceedings of the 8th Workshop on Hyperspectral Image and Signal Processing： Evolution in Remote Sensing. Piscataway： IEEE， 2016： 1-5.
6	CHEN J， ZHANG A. HGMF： heterogeneous graph-based fusion for multimodal data with incompleteness ［C］// Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2020： 1295-1305.
7	DELDARI S， XUE H， SAEED A， et al. COCOA： cross modality contrastive learning for sensor data ［J］. Proceedings of the ACM on Interactive， Mobile， Wearable and Ubiquitous Technologies， 2022， 6（3）： No.108.
8	HONG D， GAO L， YOKOYA N， et al. More diverse means better： multimodal deep learning meets remote-sensing imagery classification ［J］. IEEE Transactions on Geoscience and Remote Sensing， 2021， 59（5）： 4340-4354.
9	MAI S， ZENG Y， HU H. Multimodal information bottleneck： learning minimal sufficient unimodal and multimodal representations ［J］. IEEE Transactions on Multimedia， 2023， 25： 4121-4134.
10	WANG Q， CHI Y， SHEN T， et al. Improving RGB-infrared object detection by reducing cross-modality redundancy ［J］. Remote Sensing， 2022， 14（9）： No.2020.
11	CUI S， CAO J， CONG X， et al. Enhancing multimodal entity and relation extraction with variational information bottleneck ［J］. IEEE/ACM Transactions on Audio， Speech， and Language Processing， 2024， 32： 1274-1285.
12	汪超，唐超，王文剑，等.基于多模态注意力网络的红外人体行为识别方法［J］.计算机科学， 2024， 51（8）： 232-241.
	WANG C， TANG C， WANG W J， et al. Infrared human action recognition method based on multimodal attention network ［J］. Computer Science， 2024， 51（8）： 232-241.
13	CHEN H， LI Y. Three-stream attention-aware network for RGB-D salient object detection ［J］. IEEE Transactions on Image Processing， 2019， 28（6）： 2825-2835.
14	SUN X， ZHANG L， YANG H， et al. Enhancement of spectral resolution for remotely sensed multispectral image ［J］. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing， 2015， 8（5）： 2198-2211.
15	MALEC S， ROGGE D， HEIDEN U， et al. Capability of spaceborne hyperspectral EnMAP mission for mapping fractional cover for soil erosion modeling ［J］. Remote Sensing， 2015， 7（9）： 11776-11800.
16	YOKOYA N， YAIRI T， IWASAKI A. Coupled nonnegative matrix factorization unmixing for hyperspectral and multispectral data fusion ［J］. IEEE Transactions on Geoscience and Remote Sensing， 2012， 50（2）： 528-537.
17	MOU L， ZHU X X. RiFCN： recurrent network in fully convolutional network for semantic segmentation of high resolution remote sensing images ［EB/OL］. ［2023-11-12］. .
18	HONG D， YOKOYA N， CHANUSSOT J， et al. CoSpace： common subspace learning from hyperspectral-multispectral correspondences ［J］. IEEE Transactions on Geoscience and Remote Sensing， 2019， 57（7）： 4349-4359.
19	HONG D， YOKOYA N， GE N， et al. Learnable Manifold Alignment （LeMA）： a semi-supervised cross-modality learning framework for land cover and land use classification ［J］. ISPRS Journal of Photogrammetry and Remote Sensing， 2019， 147： 193-205.
20	LI G， LIU Z， CHEN M， et al. Hierarchical alternate interaction network for RGB-D salient object detection ［J］. IEEE Transactions on Image Processing， 2021， 30： 3528-3542.
21	CHEN M， XING L， WANG Y， et al. Enhanced multimodal representation learning with cross-modal KD ［C］// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2023： 11766-11775.
22	HONG D， YOKOYA N， XIA G S， et al. X-ModalNet： a semi-supervised deep cross-modal network for classification of remote sensing data ［J］. ISPRS Journal of Photogrammetry and Remote Sensing， 2020， 167： 12-23.
23	HONG D， YAO J， MENG D， et al. Multimodal GANs： toward crossmodal hyperspectral-multispectral image segmentation ［J］. IEEE Transactions on Geoscience and Remote Sensing， 2021， 59（6）： 5103-5113.
24	HUANG J， HUANG X， YANG J. Residual enhanced multi-hypergraph neural network ［C］// Proceedings of the 2021 IEEE International Conference on Image Processing. Piscataway： IEEE， 2021： 3657-3661.
25	XU N， MAO W. A residual merged neutral network for multimodal sentiment analysis ［C］// Proceedings of IEEE 2nd International Conference on Big Data Analysis. Piscataway： IEEE， 2017： 6-10.
26	SAXE A M， BANSAL Y， DAPELLO J， et al. On the information bottleneck theory of deep learning ［J］. Journal of Statistical Mechanics： Theory and Experiment， 2019， 2019（12）： No.124020.
27	WAN B， LV C， ZHOU X， et al. TMNet： triple-modal interaction encoder and multi-scale fusion decoder network for V-D-T salient object detection ［J］. Pattern Recognition， 2024， 147： No.110074.
28	CHENG D， LI X， QI M， et al. Exploring cross-modality commonalities via dual-stream multi-branch network for infrared-visible person re-identification ［J］. IEEE Access， 2020， 8： 12824-12834.
29	LI J， LIU J， ZHOU S， et al. Infrared and visible image fusion based on residual dense network and gradient loss ［J］. Infrared Physics and Technology， 2023， 128： No.104486.
30	武凌霄，康家银，姬云翔，等.基于多判别器双流生成对抗网络的红外与可见光图像融合［J］.兵工学报， 2024， 45（6）： 1799-1812.
	WU L X， KANG J Y， JI Y X， et al. Infrared and visible image fusion using dual-stream generative adversarial network with multiple discriminators ［J］. Acta Armamentarii， 2024， 45（6）： 1799-1812.
31	贾永乐，周李涌，刘月峰，等.基于改进双流ResNet网络的人体行为识别算法研究［J］.内蒙古科技大学学报， 2023， 42（2）： 145-148.
	JIA Y L， ZHOU L Y， LIU Y F， et al. Research on human behavior recognition algorithm based on improved dual-current ResNet network ［J］. Journal of Inner Mongolia University of Science and Technology， 2023， 42（2）： 145-148.
32	杨叶君，刘刚，肖刚，等.基于自适应特征增强和生成器路径交互的红外与可见光图像融合［J］.激光与光电子学进展， 2023， 60（14）： No.1410018.
	YANG Y J， LIU G， XIAO G， et al. Infrared and visible image fusion based on adaptive feature enhancement and generator path interaction ［J］. Laser and Optoelectronics Progress， 2023， 60（14）： No.1410018.
33	吴波，张荣芬，刘宇红.改进ViT的RGB-T多模态交互跟踪算法研究［J/OL］.计算机工程与应用［2024-03-19］. .
	WU B， ZHANG R F， LIU Y H. Research on RGB-T multimodal interaction tracking algorithm with improved ViT ［J/OL］. Computer Engineering and Applications ［2024-03-19］. .
34	朱伟杰，陈莹.双流网络信息交互机制下的微表情识别［J］.计算机辅助设计与图形学学报， 2021， 33（4）： 545-552.
	ZHU W J， CHEN Y. Micro-expression recognition based on dual-stream networks information interaction ［J］. Journal of Computer-Aided Design and Computer Graphics， 2021， 33（4）： 545-552.
35	ZHU X X， TUIA D， MOU L， et al. Deep learning in remote sensing： a comprehensive review and list of resources ［J］. IEEE Geoscience and Remote Sensing Magazine， 2017， 5（4）： 8-36.
36	ZHANG Y， LIU Y， SUN P， et al. IFCNN： a general image fusion framework based on convolutional neural network ［J］. Information Fusion， 2020， 54： 99-118.
37	LI H， WU X J. DenseFuse： a fusion approach to infrared and visible images ［J］. IEEE Transactions on Image Processing， 2018， 28（5）： 2614-2623.
38	BROWN M， SÜSSTRUNK S. Multi-spectral SIFT for scene category recognition ［C］// Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2011： 177-184.

卷积层	卷积核大小	通道数	激活函数	池化操作	池化大小
Conv1	5×5	16	ReLU	MaxPooling	2×2
Conv2	3×3	32	ReLU	MaxPooling	2×2
Conv3	3×3	64	ReLU	MaxPooling	2×2
Conv4	3×3	128	ReLU	MaxPooling	2×2

卷积层	卷积核大小	通道数	激活函数	池化操作	池化大小
Conv1	5×5	16	ReLU	MaxPooling	2×2
Conv2	3×3	32	ReLU	MaxPooling	2×2
Conv3	3×3	64	ReLU	MaxPooling	2×2
Conv4	3×3	128	ReLU	MaxPooling	2×2

编号	台风等级	风速/（m·s^-1）	训练集样本数	测试集样本数
合计			4 906	548
1	热带低压	<17	1 114	124
2	热带风暴	≥17~<25	1 015	114
3	强热带风暴	≥25~<33	1 024	114
4	台风	≥33~<42	1 035	116
5	强台风	≥42	718	80

编号	台风等级	风速/（m·s^-1）	训练集样本数	测试集样本数
合计			4 906	548
1	热带低压	<17	1 114	124
2	热带风暴	≥17~<25	1 015	114
3	强热带风暴	≥25~<33	1 024	114
4	台风	≥33~<42	1 035	116
5	强台风	≥42	718	80

方法	类别准确率					总体准确率	G-mean
方法	热带低压	热带风暴	强热带风暴	台风	强台风	总体准确率	G-mean
单个红外模态	0.798 4	0.657 9	0.649 1	0.758 6	0.612 5	0.703 5	0.789 7
简单级联结构	0.903 2	0.614 0	0.429 8	0.681 0	0.737 5	0.673 1	0.755 2
DAE+简单级联	0.830 6	0.596 5	0.605 3	0.793 1	0.787 5	0.721 4	0.798 9
DAE+CMFI	0.790 3	0.675 4	0.684 2	0.744 1	0.787 5	0.739 2	0.817 6

用于红外-可见光图像分类的跨模态双流交替交互网络

Cross-modal dual-stream alternating interactive network for infrared-visible image classification

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 18

参考文献 38

相关文章 15

编辑推荐

Metrics

编号	类别	训练样本数	测试样本数
合计		378	99
1	乡村（country）	41	11
2	田地（field）	40	11
3	森林（forest）	42	11
4	室内（indoor）	45	11
5	山脉（mountain）	44	11
6	旧建筑（old building）	40	11
7	街道（street）	39	11
8	都市（urban）	47	11
9	海域（water）	40	11

方法	类别准确率									总体准确率
方法	乡村	田地	森林	室内	山脉	旧建筑	街道	都市	海域	总体准确率
单个红外模态	0.454 5	0.727 3	0.454 5	0.454 5	0.090 9	0.636 4	0.909 1	0.636 4	0.636 4	0.568 4
简单级联结构	0.181 8	0.454 5	0.636 4	0.545 5	0.363 6	0.363 6	0.909 1	0.363 6	0.727 3	0.505 1
DAE+简单级联	0.090 9	0.636 4	0.545 5	0.818 2	0.181 8	0.454 5	1.000 0	0.000 0	0.909 1	0.532 7
DAE +CMFI	0.363 6	0.818 2	0.909 1	0.727 3	0.272 7	0.454 5	0.909 1	0.636 4	0.636 4	0.639 8

[1]	徐欣然, 张绍兵, 成苗, 张洋, 曾尚. 基于多路层次化混合专家模型的轴承故障诊断方法[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 59-68.
[2]	梁杰涛, 罗兵, 付兰慧, 常青玲, 李楠楠, 易宁波, 冯其, 何鑫, 邓辅秦. 基于坐标几何采样的点云配准方法[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 214-222.
[3]	晏燕, 钱星颖, 闫鹏斌, 杨杰. 位置大数据的联邦学习统计预测与差分隐私保护方法[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 127-135.
[4]	张思齐, 张金俊, 王天一, 秦小林. 基于信号时态逻辑的深度时序事件检测算法[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 90-97.
[5]	黄云川, 江永全, 黄骏涛, 杨燕. 基于元图同构网络的分子毒性预测[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2964-2969.
[6]	潘烨新, 杨哲. 基于多级特征双向融合的小目标检测优化模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2871-2877.
[7]	李顺勇, 李师毅, 胥瑞, 赵兴旺. 基于自注意力融合的不完整多视图聚类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2696-2703.
[8]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[9]	王熙源, 张战成, 徐少康, 张宝成, 罗晓清, 胡伏原. 面向手术导航3D/2D配准的无监督跨域迁移网络[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2911-2918.
[10]	高鹏淇, 黄鹤鸣, 樊永红. 融合坐标与多头注意力机制的交互语音情感识别[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2400-2406.
[11]	刘禹含, 吉根林, 张红苹. 基于骨架图与混合注意力的视频行人异常检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2551-2557.
[12]	顾焰杰, 张英俊, 刘晓倩, 周围, 孙威. 基于时空多图融合的交通流量预测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2618-2625.
[13]	石乾宏, 杨燕, 江永全, 欧阳小草, 范武波, 陈强, 姜涛, 李媛. 面向空气质量预测的多粒度突变拟合网络[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2643-2650.
[14]	赵亦群, 张志禹, 董雪. 基于密集残差物理信息神经网络的各向异性旅行时计算方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2310-2318.
[15]	徐松, 张文博, 王一帆. 基于时空信息的轻量视频显著性目标检测网络[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2192-2199.

方法	类别准确率					总体准确率	G-mean
方法	热带低压	热带风暴	强热带风暴	台风	强台风	总体准确率	G-mean
IFCNN	0.411 3	0.535 1	0.535 1	0.387 9	0.487 5	0.469 0	0.627 5
DenseFuse	0.596 8	0.622 8	0.561 4	0.629 3	0.762 5	0.626 0	0.713 0
DAINet	0.806 5	0.596 5	0.552 6	0.646 6	0.612 5	0.642 8	0.742 5

方法	类别准确率					总体准确率	G-mean
方法	热带低压	热带风暴	强热带风暴	台风	强台风	总体准确率	G-mean
IFCNN	0.411 3	0.535 1	0.535 1	0.387 9	0.487 5	0.469 0	0.627 5
DenseFuse	0.596 8	0.622 8	0.561 4	0.629 3	0.762 5	0.626 0	0.713 0
DAINet	0.806 5	0.596 5	0.552 6	0.646 6	0.612 5	0.642 8	0.742 5