SAR and visible image fusion based on residual Swin Transformer

doi:10.11772/j.issn.1001-9081.2024081166

Abstract

Abstract:

In the fusion research of Synthetic Aperture Radar （SAR） and visible images， the existing methods usually face the challenges of large modal differences， information loss and high computational complexity. Therefore， an SAR and visible image fusion algorithm based on residual Swin Transformer module was proposed. Firstly， Swin Transformer was used as the backbone to extract global features， and a full-attention feature coding backbone network was used to model remote dependencies. Secondly， in order to improve fusion effect， three different fusion strategies were designed： feature fusion strategy based on L1 norm of sequence matrix， fusion strategy based on image pyramid， and additive fusion strategy. Thirdly， the final fusion result was obtained by weighted averaging the three results， which adjusted pixel value and reduced noise of SAR image effectively， better retained clear details and structural information of visible image， and fused surface feature information of SAR image and visible image at different scales. Finally， many experiments were carried out on SEN1-2 dataset， QXS-SAROPT dataset， and OSdataset. Experimental results show that compared with the algorithms such as general image fusion framework based on convolutional neural network IFCNN， and Multi-level Decomposition based on Latent Low-Rank Representation （MDLatLRR）， the proposed algorithm has better subjective visual effects with significant improvement in most objective evaluation indicators， and has excellent noise suppression and image fidelity capabilities while retaining source image features.

Key words: Synthetic Aperture Radar (SAR), visible image, image fusion, Transformer, deep learning

摘要：

在合成孔径雷达（SAR）与可见光图像的融合研究中，现有方法通常面临着模态间差异大、信息丢失和计算复杂度高等挑战。因此，提出一种基于残差Swin Transformer模块的SAR和可见光图像融合算法。首先，采用Swin Transformer作为主干提取全局特征，并用一个全注意力特征编码主干网络建模远程依赖关系。其次，为了提高融合效果，设计3种不同的融合策略：基于序列矩阵的L1范数的特征融合策略、基于图像金字塔的融合策略及加法融合策略。再次，将3个结果加权平均以得到最终的融合结果，从而有效地调节像素值并减少SAR图像的噪声，更好地保留可见光图像清晰的细节和结构信息，并融合SAR图像和可见光图像不同尺度的地物特征信息。最后，在SEN1-2数据集、QXS-SAROPT数据集以及OSdataset上进行了大量实验。实验结果表明，所提算法与基于卷积神经网络的通用图像融合框架IFCNN、基于潜在低秩表示的多级分解（MDLatLRR）等算法相比，主观视觉效果更优，在大多数客观评价指标上有明显提升，且在保留源图像特征的同时具备优秀的噪声抑制和图像保真能力。

关键词: 合成孔径雷达, 可见光图像, 图像融合, Transformer, 深度学习

CLC Number:

TP391

Jin LI, Liqun LIU. SAR and visible image fusion based on residual Swin Transformer[J]. Journal of Computer Applications, 2025, 45(9): 2949-2956.

李进, 刘立群. 基于残差Swin Transformer的SAR与可见光图像融合[J]. 《计算机应用》唯一官方网站, 2025, 45(9): 2949-2956.

Figures/Tables 17

References 26

[1]	HUANG S， ZHANG X， WANG C， et al. Two-step fusion method for generating 1 km seamless multi-layer soil moisture with high accuracy in the Qinghai-Tibet plateau ［J］. ISPRS Journal of Photogrammetry and Remote Sensing， 2023， 197： 346-363.
[2]	GU Y， WANG Y， WU Y， et al. Novel 3D photosynthetic traits derived from the fusion of UAV LiDAR point cloud and multispectral imagery in wheat ［J］. Remote Sensing of Environment， 2024， 311： No.114244.
[3]	邢庆宝. 基于深度学习的SAR与可见光图像融合算法研究［D］. 南昌：南昌大学， 2023：22-24.
	XING Q B. Research on deep learning-based SAR and visible light image fusion algorithm［D］. Nanchang： Nanchang University， 2023： 22-24.
[4]	LIU J， DIAN R， LI S， et al. SGFusion： a saliency guided deep-learning framework for pixel-level image fusion ［J］. Information Fusion， 2023， 91： 205-214.
[5]	周涛，刘珊，董雅丽，等. 多尺度变换像素级医学图像融合：研究进展、应用和挑战［J］. 中国图象图形学报， 2021， 26（9）：2094-2110.
	ZHOU T， LIU S， DONG Y L， et al. Research on pixel-level image fusion based on multi-scale transformation： progress application and challenges ［J］. Journal of Image and Graphics， 2021， 26（9）： 2094-2110.
[6]	徐景余，包妮沙，郎洁双，等.优化稀疏表示的迷彩伪装目标高光谱识别方法研究［J］.光谱学与光谱分析，2024，44（12）：3534-3542.
	XU J Y， BAO N S， LANG J S， et al. A hyperspectral recognition method for camouflaged targets based on background dictionary sparse representation ［J］. Spectroscopy and Spectral Analysis， 2024，44（12）：3534-3542.
[7]	LI X， LEI L， SUN Y， et al. Multimodal bilinear fusion network with second-order attention based channel selection for land cover classification ［J］. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing， 2020， 13： 1011-1026.
[8]	YE Y， LIU W， ZHOU L， et al. An unsupervised SAR and optical image fusion network based on structure-texture decomposition ［J］. IEEE Geoscience and Remote Sensing Letters， 2022， 19： No.4028305.
[9]	KONG Y， HONG F， LEUNG H， et al. A fusion method of optical image and SAR image based on dense-UGAN and Gram-Schmidt transformation ［J］. Remote Sensing， 2021， 13（21）： No.4274.
[10]	YIN H， XIAO J. Laplacian pyramid generative adversarial network for infrared and visible image fusion ［J］. IEEE Signal Processing Letters， 2022， 29： 1988-1992.
[11]	DOSOVISKIY A， BEYER L， KOLESNIKOV A， et al. An image is worth 16x16 words： Transformers for image recognition at scale［EB/OL］. ［2024-08-27］. .
[12]	HU L， SU S， ZUO Z， et al. A visible and synthetic aperture radar image fusion algorithm based on a Transformer and a convolutional neural network ［J］. Electronics， 2024， 13（12）： No.2365.
[13]	LIU Z， LIN Y， CAO Y， et al. Swin Transformer： hierarchical Vision Transformer using shifted windows ［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 9992-10002.
[14]	ZHOU H， LIU Q， WANG Y. PanFormer： a Transformer based model for pan-sharpening ［C］// Proceedings of the 2022 IEEE International Geoscience and Remote Sensing Symposium. Piscataway： IEEE， 2022： 1-6.
[15]	范文盛，刘帆，李明. 基于双分支U形Transformer的遥感图像融合［J］. 光子学报， 2023， 52（4）： No.0428002.
	FAN W S， LIU F， LI M. Remote sensing image fusion based on two-branch U-shaped Transformer ［J］. Acta Photonica Sinica， 2023， 52（4）： No.0428002.
[16]	LI S， WANG G， ZHANG H， et al. SDRSwin： a residual Swin Transformer network with saliency detection for infrared and visible image fusion ［J］. Remote Sensing， 2023， 15（18）： No.4467.
[17]	李碧草，卢佳熙，刘洲峰，等. 基于Swin Transformer和混合特征聚合的红外与可见光图像融合方法［J］. 红外技术， 2023， 45（7）： 721-731.
	LI B C， LU J X， LIU Z F， et al. Infrared and visible light image fusion method based on Swin Transformer and hybrid feature aggregation ［J］. Infrared Technology， 2023， 45（7）： 721-731.
[18]	LUO Y， LUO Z. Infrared and visible image fusion algorithm based on improved residual Swin Transformer and Sobel operators ［J］. IEEE Access， 2024， 12： 82134-82145.
[19]	WANG Z， CHEN Y， SHAO W， et al. Swinfuse： a residual Swin Transformer fusion network for infrared and visible images ［J］. IEEE Transactions on Instrumentation and Measurement， 2022， 71： No.5016412.
[20]	SCHMITT M， HUGHES L H， ZHU X X. The SEN1-2 dataset for deep learning in SAR-optical data fusion ［J］. SPRS Annals of the Photogrammetry， Remote Sensing and Spatial Information Sciences， 2018， Ⅳ-1： 141-146.
[21]	HUANG M， XU Y， QIAN L， et al. The QXS-SAROPT dataset for deep learning in SAR-optical data fusion ［EB/OL］. ［2024-05-21］. .
[22]	XIANG Y， TAO R， WANG F， et al. Automatic registration of optical and SAR images via improved phase congruency model ［J］. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing， 2020， 13： 5847-5861.
[23]	ZHANG Y， LIU Y， SUN P， et al. IFCNN： a general image fusion framework based on convolutional neural network ［J］. Information Fusion， 2020， 54：99-118.
[24]	MA J， ZHANG H， SHAO Z， et al. GANMcC： a generative adversarial network with multiclassification constraints for infrared and visible image fusion ［J］. IEEE Transactions on Instrumentation and Measurement， 2021， 70： No.5005014.
[25]	LI H， WU X， KITTLER J. MDLatLRR： a novel decomposition method for infrared and visible image fusion ［J］. IEEE Transactions on Image Processing， 2020， 29：4733-4746.
[26]	LIU K， LI M， CHEN C， et al. DSFusion： infrared and visible image fusion method combining detail and scene information ［J］. Pattern Recognition， 2024， 154： No.110633.

a	b	c	AG	MI	SD	PSNR/dB	SF	EN
0.05	0.05	0.9	13.26	2.32	49.22	55.51	26.74	4.79
0.10	0.10	0.8	12.40	1.98	47.68	54.72	26.55	4.37
0.15	0.15	0.7	12.63	2.72	48.85	56.21	28.45	5.77
0.20	0.20	0.6	12.04	2.78	49.53	56.84	30.31	6.18
0.25	0.25	0.5	10.48	2.64	49.66	53.92	27.90	5.54
0.30	0.30	0.4	9.74	2.36	44.34	52.44	25.36	5.11
0.35	0.35	0.3	8.65	1.57	44.60	50.47	24.10	4.86
0.40	0.40	0.2	7.36	1.62	42.05	48.33	25.08	3.95
0.45	0.45	0.1	6.95	1.79	40.84	47.70	23.52	4.06

a	b	c	AG	MI	SD	PSNR/dB	SF	EN
0.05	0.05	0.9	13.26	2.32	49.22	55.51	26.74	4.79
0.10	0.10	0.8	12.40	1.98	47.68	54.72	26.55	4.37
0.15	0.15	0.7	12.63	2.72	48.85	56.21	28.45	5.77
0.20	0.20	0.6	12.04	2.78	49.53	56.84	30.31	6.18
0.25	0.25	0.5	10.48	2.64	49.66	53.92	27.90	5.54
0.30	0.30	0.4	9.74	2.36	44.34	52.44	25.36	5.11
0.35	0.35	0.3	8.65	1.57	44.60	50.47	24.10	4.86
0.40	0.40	0.2	7.36	1.62	42.05	48.33	25.08	3.95
0.45	0.45	0.1	6.95	1.79	40.84	47.70	23.52	4.06

算法	AG	MI	SD	PSNR/dB	SF	EN
IFCNN^［23］	9.83	2.59	47.17	53.63	27.28	5.92
GANMcC^［24］	10.92	2.76	49.88	55.12	26.60	5.74
MDLatLRR^［25］	11.44	2.61	47.20	53.24	26.31	5.42
Swinfuse^［19］	12.68	2.62	49.36	56.79	30.29	6.08
DSFusion^［26］	13.32	2.74	47.27	60.33	29.64	5.89
本文算法	12.04	2.78	49.53	56.84	30.31	6.18

算法	AG	MI	SD	PSNR/dB	SF	EN
IFCNN^［23］	9.83	2.59	47.17	53.63	27.28	5.92
GANMcC^［24］	10.92	2.76	49.88	55.12	26.60	5.74
MDLatLRR^［25］	11.44	2.61	47.20	53.24	26.31	5.42
Swinfuse^［19］	12.68	2.62	49.36	56.79	30.29	6.08
DSFusion^［26］	13.32	2.74	47.27	60.33	29.64	5.89
本文算法	12.04	2.78	49.53	56.84	30.31	6.18

算法	AG	MI	SD	PSNR/dB	SF	EN
IFCNN^［23］	10.41	2.13	28.55	58.69	43.72	5.78
GANMcC^［24］	18.07	2.71	39.98	61.20	46.39	6.03
MDLatLRR^［25］	13.34	1.89	36.32	52.46	27.48	5.34
Swinfuse^［19］	22.19	2.04	57.71	58.83	28.95	5.95
DSFusion^［26］	20.48	2.44	40.72	59.78	47.33	6.11
本文算法	8.93	2.79	30.96	60.91	48.14	6.14