Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (9): 2949-2956.DOI: 10.11772/j.issn.1001-9081.2024081166

• Multimedia computing and computer simulation • Previous Articles    

SAR and visible image fusion based on residual Swin Transformer

Jin LI1, Liqun LIU2()   

  1. 1.College of Science,Gansu Agricultural University,Lanzhou Gansu 730070,China
    2.College of Information Science and Technology,Gansu Agricultural University,Lanzhou Gansu 730070,China
  • Received:2024-08-19 Revised:2024-11-28 Accepted:2024-12-10 Online:2025-02-17 Published:2025-09-10
  • Contact: Liqun LIU
  • About author:LI Jin, born in 2001, M. S. candidate. Her research interests include deep learning, image fusion.
  • Supported by:
    National Natural Science Foundation of China(32460440);Gansu Provincial University Teacher Innovation Fund(2023A-051)

基于残差Swin Transformer的SAR与可见光图像融合

李进1, 刘立群2()   

  1. 1.甘肃农业大学 理学院,兰州 730070
    2.甘肃农业大学 信息科学技术学院,兰州 730070
  • 通讯作者: 刘立群
  • 作者简介:李进(2001—),女,江西抚州人,硕士研究生,主要研究方向:深度学习、图像融合
  • 基金资助:
    国家自然科学基金资助项目(32460440);甘肃省高校教师创新基金资助项目(2023A-051)

Abstract:

In the fusion research of Synthetic Aperture Radar (SAR) and visible images, the existing methods usually face the challenges of large modal differences, information loss and high computational complexity. Therefore, an SAR and visible image fusion algorithm based on residual Swin Transformer module was proposed. Firstly, Swin Transformer was used as the backbone to extract global features, and a full-attention feature coding backbone network was used to model remote dependencies. Secondly, in order to improve fusion effect, three different fusion strategies were designed: feature fusion strategy based on L1 norm of sequence matrix, fusion strategy based on image pyramid, and additive fusion strategy. Thirdly, the final fusion result was obtained by weighted averaging the three results, which adjusted pixel value and reduced noise of SAR image effectively, better retained clear details and structural information of visible image, and fused surface feature information of SAR image and visible image at different scales. Finally, many experiments were carried out on SEN1-2 dataset, QXS-SAROPT dataset, and OSdataset. Experimental results show that compared with the algorithms such as general image fusion framework based on convolutional neural network IFCNN, and Multi-level Decomposition based on Latent Low-Rank Representation (MDLatLRR), the proposed algorithm has better subjective visual effects with significant improvement in most objective evaluation indicators, and has excellent noise suppression and image fidelity capabilities while retaining source image features.

Key words: Synthetic Aperture Radar (SAR), visible image, image fusion, Transformer, deep learning

摘要:

在合成孔径雷达(SAR)与可见光图像的融合研究中,现有方法通常面临着模态间差异大、信息丢失和计算复杂度高等挑战。因此,提出一种基于残差Swin Transformer模块的SAR和可见光图像融合算法。首先,采用Swin Transformer作为主干提取全局特征,并用一个全注意力特征编码主干网络建模远程依赖关系。其次,为了提高融合效果,设计3种不同的融合策略:基于序列矩阵的L1范数的特征融合策略、基于图像金字塔的融合策略及加法融合策略。再次,将3个结果加权平均以得到最终的融合结果,从而有效地调节像素值并减少SAR图像的噪声,更好地保留可见光图像清晰的细节和结构信息,并融合SAR图像和可见光图像不同尺度的地物特征信息。最后,在SEN1-2数据集、QXS-SAROPT数据集以及OSdataset上进行了大量实验。实验结果表明,所提算法与基于卷积神经网络的通用图像融合框架IFCNN、基于潜在低秩表示的多级分解(MDLatLRR)等算法相比,主观视觉效果更优,在大多数客观评价指标上有明显提升,且在保留源图像特征的同时具备优秀的噪声抑制和图像保真能力。

关键词: 合成孔径雷达, 可见光图像, 图像融合, Transformer, 深度学习

CLC Number: