《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (1): 275-283.DOI: 10.11772/j.issn.1001-9081.2024010026

• 多媒体计算与计算机仿真 • 上一篇    下一篇

用于红外-可见光图像分类的跨模态双流交替交互网络

郑宗生1, 杜嘉1(), 成雨荷1, 赵泽骋1, 张月维2, 王绪龙3   

  1. 1.上海海洋大学 信息学院,上海 201306
    2.广州气象卫星地面站,广州 510650
    3.山东省国土空间数据和遥感技术研究院(山东省海域动态监视监测中心),济南 250014
  • 收稿日期:2024-01-15 修回日期:2024-03-26 接受日期:2024-04-01 发布日期:2024-05-09 出版日期:2025-01-10
  • 通讯作者: 杜嘉
  • 作者简介:郑宗生(1979—),男,河北唐山人,副教授,博士,主要研究方向:深度学习、遥感图像处理;
    成雨荷(2000—),女,新疆博尔塔拉人,硕士研究生,主要研究方向:深度学习、遥感图像处理;
    赵泽骋(1999—),男,山东泰安人,硕士研究生,主要研究方向:深度学习、海浪预测;
    张月维(1977—),男,广东博罗人,工程师,主要研究方向:地理信息科学、计算机科学;
    王绪龙(1972—),男,山东东平人,高级工程师,硕士,主要研究方向:遥感、地理信息。
  • 基金资助:
    国家自然科学基金资助项目(41671431);上海市科委地方院校能力建设项目(19050502100);广州气象卫星地面站项目(D-8006-23-0157)

Cross-modal dual-stream alternating interactive network for infrared-visible image classification

Zongsheng ZHENG1, Jia DU1(), Yuhe CHENG1, Zecheng ZHAO1, Yuewei ZHANG2, Xulong WANG3   

  1. 1.College of Information Technology,Shanghai Ocean University,Shanghai 201306,China
    2.Guangzhou Meteorological Satellite Ground Station,Guangzhou Guangdong 510650,China
    3.Shandong Provincial Institute of Land Space Data and Remote Sensing Technology (Shandong Provincial Marine Dynamic Monitoring Center),Jinan Shandong 250014,China
  • Received:2024-01-15 Revised:2024-03-26 Accepted:2024-04-01 Online:2024-05-09 Published:2025-01-10
  • Contact: Jia DU
  • About author:ZHENG Zongsheng, born in 1979, Ph. D., associate professor. His research interests include deep learning, remote sensing image processing.
    CHENG Yuhe, born in 2000, M. S. candidate. Her research interests include deep learning, remote sensing image processing.
    ZHAO Zecheng, born in 1999, M. S. candidate. His research interests include deep learning, wave prediction.
    ZHANG Yuewei, born in 1977, engineer. His research interests include geographic information science, computer science.
    WANG Xulong, born in 1972, M. S., senior engineer. His research interests include remote sensing, geographic information.
  • Supported by:
    National Natural Science Foundation of China(41671431);Shanghai Municipal Science and Technology Commission Local Colleges Capacity Building Project(19050502100);Guangzhou Meteorological Satellite Ground Station Project(D-8006-23-0157)

摘要:

多特征模态融合时存在噪声的叠加,而为减小模态间的差异采用的级联方式的结构也未充分利用模态间的特征信息,因此设计一种跨模态双流交替交互网络(DAINet)方法。首先,构建双流交替增强(DAE)模块,以交互双分支形式融合模态特征,并通过学习模态数据的映射关系,以红外-可见光-红外(IR-VIS-IR)和可见光-红外-可见光(VIS-IR-VIS)的双向反馈调节实现模态间噪声的交叉抑制;然后,构建跨模态特征交互(CMFI)模块,并引入残差结构将红外-可见光模态内以及模态间的低层特征和高层特征进行有效融合,从而减小模态间的差异并充分利用模态间的特征信息;最后,在自建红外-可见光多模态台风数据集及RGB-NIR多模态公开场景数据集上进行实验,以验证DAE模块和CMFI模块的有效性。实验结果表明,与简单级联融合方法相比,所提的基于DAINet的特征融合方法在自建台风数据集上的红外模态和可见光模态上的总体分类精度分别提高了6.61和3.93个百分点,G-mean值分别提高了6.24和2.48个百分点,表明所提方法在类别不均衡分类任务上的通用性;所提方法在RGB-NIR数据集上的2种测试模态下的总体分类精度分别提高了13.47和13.90个百分点。同时,所提方法在2个数据集上分别与IFCNN(general Image Fusion framework based on Convolutional Neural Network)和DenseFuse方法进行对比的实验结果表明,所提方法在自建台风数据集上的2种测试模态下的总体分类精度分别提高了9.82、6.02和17.38、1.68个百分点。

关键词: 跨模态, 深度学习, 图像分类, 特征学习, 双流网络

Abstract:

When multiple feature modalities are fused, there is a superposition of noise, and the cascaded structure used to reduce the differences between modalities does not fully utilize the feature information between modalities. To address these issues, a cross-modal Dual-stream Alternating Interactive Network (DAINet) method was proposed. Firstly, a Dual-stream Alternating Enhancement (DAE) module was constructed to fuse modal features in interactive dual-branch way. And by learning mapping relationships between modalities and employing bidirectional feedback adjustments of InFrared-VISible-InFrared (IR-VIS-IR) and VISible-InfRared-VISible (VIS-IR-VIS), the cross suppression of inter-modal noise was realized. Secondly, a Cross-Modal Feature Interaction (CMFI) module was constructed, and the residual structure was introduced to integrate low-level and high-level features within and between infrared-visible modalities, thereby minimizing differences and maximizing inter-modal feature utilization. Finally, on a self-constructed infrared-visible multi-modal typhoon dataset and a publicly available RGB-NIR multi-modal dataset, the effectiveness of DAE module and CMFI module was verified. Experimental results demonstrate that compared to the simple cascading fusion method on the self-constructed typhoon dataset, the proposed DAINet-based feature fusion method improves the overall classification accuracy by 6.61 and 3.93 percentage points for the infrared and visible modalities, respectively, with G-mean values increased by 6.24 and 2.48 percentage points, respectively. These results highlight the generalizability of the proposed method for class-imbalanced classification tasks. On the RGB-NIR dataset, the proposed method achieves the overall classification accuracy improvements of 13.47 and 13.90 percentage points, respectively, for the two test modalities. At the same time, experimental results of comparing with IFCNN (general Image Fusion framework based on Convolutional Neural Network) and DenseFuse methods demonstrate that the proposed method improves the overall classification accuracy by 9.82, 6.02, and 17.38, 1.68 percentage points for the two test modalities on the self-constructed typhoon dataset.

Key words: cross-modal, deep learning, image classification, feature learning, dual-stream network

中图分类号: