Generative adversarial network underwater image enhancement model based on Swin Transformer

doi:10.11772/j.issn.1001-9081.2024050730

Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (5): 1439-1446.DOI: 10.11772/j.issn.1001-9081.2024050730

• 2024 China Granular Computing and Knowledge Discovery Conference (CGCKD2024) • Previous Articles

Generative adversarial network underwater image enhancement model based on Swin Transformer

Hui LI, Bingzhi JIA, Chenxi WANG, Ziyu DONG, Jilong LI, Zhaoman ZHONG, Yanyan CHEN()

School of Computer Engineering，Jiangsu Ocean University，Lianyungang Jiangsu 222005，China

Received:2024-06-03 Revised:2024-07-12 Accepted:2024-07-18 Online:2024-08-12 Published:2025-05-10
Contact: Yanyan CHEN
About author:LI Hui， born in 1979， Ph. D.， professor. Her research interests include image processing， computer vision.
JIA Bingzhi， born in 1999， M. S. candidate. His research interests include image enhancement， deep learning.
WANG Chenxi， born in 2002. His research interests include image processing.
DONG Ziyu， born in 2004. His research interests include artificial intelligence.
LI Jilong， born in 2000， M. S. candidate. His research interests include artificial intelligence.
ZHONG Zhaoman， born in 1977， Ph. D.， professor. His research interests include artificial intelligence， natural language processing， big data collection and analysis， social network analysis.
CHEN Yanyan， born in 1972， M. S.， lecturer. Her research interests include artificial intelligence，intelligent information processing.
Supported by:
National Natural Science Foundation of China project(72174079);The Sixth Phase of the “521” Program in Lianyungang City(LYG06521202351);Lianyungang Science and Technology Plan Program(CG2325)

基于Swin Transformer的生成对抗网络水下图像增强模型

李慧, 贾炳志, 王晨曦, 董子宇, 李纪龙, 仲兆满, 陈艳艳()

江苏海洋大学计算机工程学院，江苏连云港 222005

通讯作者: 陈艳艳
作者简介:李慧（1979—），女，江苏连云港人，教授，博士，主要研究方向：图像处理、计算机视觉
贾炳志（1999—），男，山东枣庄人，硕士研究生，主要研究方向：图像增强、深度学习
王晨曦（2002—），男，江苏徐州人，主要研究方向：图像处理
董子宇（2004—），男，江苏连云港人，主要研究方向：人工智能
李纪龙（2000—），男，江西南昌人，硕士研究生，主要研究方向：人工智能
仲兆满（1977—），男，江苏连云港人，教授，博士，CCF会员，主要研究方向：人工智能、自然语言处理、大数据采集与分析、社交网络分析
陈艳艳（1972—），女，山东东明人，讲师，硕士，主要研究方向：人工智能、智能信息处理。
基金资助:
国家自然科学基金资助项目(72174079);连云港市第六期“521”项目(LYG06521202351);连云港市科技计划项目(CG2325)

Abstract

Abstract:

Aiming at the problems of low contrast， heavy noise and color deviation in underwater images， using Generative Adversarial Network （GAN） model as the core framework， a new underwater image enhancement model was proposed based on GAN， namely SwinGAN （GAN based on Swin Transformer）. Firstly， the generative network was designed according to the encoder-bottleneck-decoder structure， where the input feature maps were divided into multiple non-overlapping local windows at the bottleneck layer. Secondly， a Dual-path Window Multi-head Self-Attention mechanism（DWMSA） was introduced to enhance local attention while simultaneously capturing global information and long-range dependencies. Finally， the decoder recombined the multiple windows back into the original size feature maps， and the discriminator network employed a Markov discriminator. Compared to the URSCT-SESR model， SwinGAN model shows an improvement of 0.837 2 dB in Peak Signal-to-Noise Ratio （PSNR） and 0.003 6 in Structural SIMilarity index （SSIM） on the UFO-120 dataset. On the EUVP-515 dataset， SwinGAN model achieves more significant improvement， with a 0.843 9 dB boost in PSNR， an increase of 0.005 1 in SSIM， an enhancement of 0.112 4 in Underwater Image Quality Measure （UIQM）， and a slight increase of 0.001 0 in Underwater Color Image Quality Evaluation （UCIQE）. Experimental results demonstrate that the SwinGAN model excels in both subjective and objective evaluation metrics， achieving notable improvements in correcting color deviation in underwater images.

Key words: underwater image enhancement, Swin Transformer, Generative Adversarial Network (GAN), multi-head self-attention mechanism, Markov discriminator

摘要：

针对水下图像对比度低、噪声大和存在色彩偏差等问题，以生成对抗网络（GAN）为核心框架，提出一种基于Swin Transformer的生成对抗网络水下图像增强模型SwinGAN（GAN based on Swin Transformer）。首先，生成网络部分遵循编码器-瓶颈层-解码器的结构设计，在瓶颈层将输入的特征图分割成多个不重叠的局部窗口；其次，引入双路窗口多头自注意力机制（DWMSA），在加强捕获全局信息和长距离依赖关系的同时，增强局部注意力；最后，在解码器中将下采样后的特征图经过多个上采样窗口重新组合成原始尺寸的特征图，判别网络则采用马尔可夫判别器。实验结果表明，与URSCT-SESR模型相比，在UFO-120数据集上，SwinGAN的峰值信噪比（PSNR）提升了0.837 2 dB，结构相似度（SSIM）提高了0.003 6；在EUVP-515数据集上，SwinGAN的PSNR提升了0.843 9 dB，SSIM提高了0.005 1，水下图像质量评价指标（UIQM）增加了0.112 4，水下彩色图像质量评估指标（UCIQE）略有上升，增加了0.001 0。可见，SwinGAN的主观评价以及客观评价指标都表现出色，在改善水下图像的色彩偏差问题上取得了不错的效果。

关键词: 水下图像增强, Swin Transformer, 生成对抗网络, 多头自注意力机制, 马尔可夫判别器

CLC Number:

TP391.41

Hui LI, Bingzhi JIA, Chenxi WANG, Ziyu DONG, Jilong LI, Zhaoman ZHONG, Yanyan CHEN. Generative adversarial network underwater image enhancement model based on Swin Transformer[J]. Journal of Computer Applications, 2025, 45(5): 1439-1446.

李慧, 贾炳志, 王晨曦, 董子宇, 李纪龙, 仲兆满, 陈艳艳. 基于Swin Transformer的生成对抗网络水下图像增强模型[J]. 《计算机应用》唯一官方网站, 2025, 45(5): 1439-1446.

Figures/Tables 9

Fig. 1 Generation network of SwinGAN

Fig. 2 Overall structure of DWMSA

Fig. 3 Schematic detail diagram of DWMSA

Fig. 4 Discriminator network

Tab. 1 Parameter comparison experimental results for loss functions

实验序号	参数设置				评价指标
实验序号	$λ 1$	$λ 2$	$λ 3$	$λ 4$	PSNR/dB	SSIM	UIQM	UCIQE
1	0.1	0.2	0.1	0.6	29.927 1	0.851 4	2.859 6	0.586 9
2	0.1	0.2	0.2	0.5	29.890 2	0.855 9	2.796 0	0.586 5
3	0.1	0.2	0.3	0.4	30.028 0	0.849 1	2.841 5	0.582 4
4	0.1	0.2	0.4	0.3	29.903 0	0.859 2	2.903 9	0.583 3
5	0.1	0.2	0.5	0.2	29.923 1	0.856 3	2.844 8	0.585 0
6	0.1	0.2	0.6	0.1	30.023 2	0.858 1	2.843 2	0.583 0

Tab. 1 Parameter comparison experimental results for loss functions

实验序号	参数设置				评价指标
实验序号	$λ 1$	$λ 2$	$λ 3$	$λ 4$	PSNR/dB	SSIM	UIQM	UCIQE
1	0.1	0.2	0.1	0.6	29.927 1	0.851 4	2.859 6	0.586 9
2	0.1	0.2	0.2	0.5	29.890 2	0.855 9	2.796 0	0.586 5
3	0.1	0.2	0.3	0.4	30.028 0	0.849 1	2.841 5	0.582 4
4	0.1	0.2	0.4	0.3	29.903 0	0.859 2	2.903 9	0.583 3
5	0.1	0.2	0.5	0.2	29.923 1	0.856 3	2.844 8	0.585 0
6	0.1	0.2	0.6	0.1	30.023 2	0.858 1	2.843 2	0.583 0

Fig. 5 Comparison of effects of different models on UFO-120 dataset

Fig. 6 Comparison of effects of different models on EUVP dataset

Tab. 2 Evaluation index comparison of different models on UFO-120 and EUVP-515 datasets

模型	UFO-120				EUVP-515
模型	PSNR/dB	SSIM	UIQM	UCIQE	PSNR/dB	SSIM	UIQM	UCIQE
UWCNN	28.458 3	0.711 3	3.164 2	0.565 9	28.558 0	0.787 2	3.035 3	0.566 3
UGAN	30.039 2	0.819 9	2.615 2	0.598 3	30.270 5	0.876 9	2.484 2	0.585 8
FUnIE_GAN	30.237 9	0.826 2	2.440 8	0.598 1	30.216 1	0.865 4	2.437 7	0.592 1
Deep_SESR	30.585 7	0.859 2	3.106 4	0.594 0	30.591 9	0.870 4	3.075 7	0.585 4
URSCT-SESR	31.521 6	0.863 8	3.096 6	0.599 6	32.702 1	0.912 1	3.141 0	0.595 5
SwinGAN	32.358 8	0.867 4	3.169 2	0.598 4	33.546 0	0.917 2	3.253 4	0.596 5

Tab. 3 Evaluation index comparison of improved modules on EUVP-515 dataset

模型	W-MSA	FNN	双卷积	PSNR/dB	SSIM	UIQM	UCIQE
Model1	√			29.431	0.842	2.368	0.439
Model2	√	√		29.503	0.783	2.775	0.590
Model3		√	√	30.336	0.834	2.825	0.591
本文模型	√	√	√	33.546	0.917	3.253	0.596

References 30

1	严浙平，曲思瑜，邢文.水下图像增强方法研究综述［J］.智能系统学报，2022，17（5）：860-873.
	YAN Z P， QU S Y， XING W. An overview of underwater image enhancement methods［J］. CAAI Transactions on Intelligent Systems，2022， 17（5）： 860-873.
2	DONG C， LOY C C， HE K， et al. Learning a deep convolutional network for image super-resolution ［C］// Proceedings of the 2014 European Conference on Computer Vision， LNCS 8692. Cham： Springer， 2014： 184-199.
3	张婷，赵杏，陈文欣. 基于条件生成对抗网络的图像去雾方法［J］. 计算机应用，2021，41（S2）：248-253.
	ZHANG T， ZHAO X， CHEN W X. Image dehazing method based on conditional generative adversarial network［J］. Journal of Computer Applications， 2021， 41（S2）： 248-253.
4	DOSOVISKIY A， BEYER L， KOLESNIKOV A， et al. An image is worth 16×16 words： Transformers for image recognition at scale ［EB/OL］. ［2021-06-03］. .
5	WANG Y， ZHANG J， CAO Y， et al. A deep CNN method for underwater image enhancement ［C］// Proceedings of the 2017 IEEE International Conference on Image Processing. Piscataway： IEEE， 2017： 1382-1386.
6	ZHENG M， LUO W. Underwater image enhancement using improved CNN based defogging［J］. Electronics， 2022， 11（1）： Article No. 150.
7	TANG Z， LI J， HUANG J， et al. Multi-scale convolution underwater image restoration network［J］. Machine Vision and Applications， 2022， 33（6）： Article No. 85.
8	LI J， SKINNER K A， EUSTICE R M， et al. WaterGAN： unsupervised generative network to enable real-time color correction of monocular underwater images［J］. IEEE Robotics and Automation Letters， 2018， 3（1）： 387-394.
9	FABBRI C， ISLAM M J， SATTAR J. Enhancing underwater imagery using generative adversarial networks［C］// Proceedings of the 2018 IEEE International Conference on Robotics and Automation. Piscataway： IEEE， 2018： 7159-7165.
10	WANG N， ZHOU Y， HAN F， et al. UWGAN： underwater GAN for real-world underwater color restoration and dehazing ［EB/OL］. ［2021-03-26］. .
11	CONG R， YANG W， ZHANG W， et al. PUGAN： physical model-guided underwater image enhancement using GAN with dual-discriminators［J］. IEEE Transactions on Image Processing， 2023， 32： 4472-4485.
12	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2017： 6000-6010.
13	LIU Z， LIN Y， CAO Y， et al. Swin Transformer： hierarchical vision transformer using shifted windows［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 9992-10002.
14	PENG L， ZHU C， BIAN L. U-Shape Transformer for underwater image enhancement［J］. IEEE Transactions on Image Processing， 2023， 32： 3066-3079.
15	CHENG N， SUN Z， ZHU X， et al. A transformer-based network for perceptual contrastive underwater image enhancement［J］. Signal Processing： Image Communication， 2023， 118： Article No. 117032.
16	FAN C M， LIU T J， LIU K H. SUNet： Swin Transformer UNet for image denoising［C］// Proceedings of the 2022 IEEE International Symposium on Circuits and Systems. Piscataway： IEEE， 2022： 2333-2337.
17	YOU D， GAO X， KATAYAMA S. WPD-PCA-based laser welding process monitoring and defects diagnosis by using FNN and SVM［J］. IEEE Transactions on Industrial Electronics， 2015， 62（1）： 628-636.
18	MAO X， LI Q， XIE H， et al. Least squares generative adversarial networks［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2813-2821.
19	LAI W S， HUANG J B， AHUJA N， et al. Deep Laplacian pyramid networks for fast and accurate super-resolution［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 5835-5843.
20	GOODFELLOW I， POUGET-ABADIE J， MIRZA M， et al. Generative adversarial networks［J］. Communications of the ACM， 2020， 63（11）： 139-144.
21	SEIF G， ANDROUTSOS D. Edge-based loss function for single image super-resolution［C］// Proceedings of the 2018 IEEE International Conference on Acoustics， Speech and Signal Processing. Piscataway： IEEE， 2018： 1468-1472.
22	WANG Z， SIMONCELLI E P， BOVIK A C. Multiscale structural similarity for image quality assessment［C］// Proceedings of the 37th Asilomar Conference on Signals， Systems and Computers — Volume 2. Piscataway： IEEE， 2003： 1398-1402.
23	ZHAO H， GALLO O， FROSIO I， et al. Loss functions for image restoration with neural networks［J］. IEEE Transactions on Computational Imaging， 2017， 3（1）： 47-57.
24	ISLAM M J， LUO P， SATTAR J. Simultaneous enhancement and super-resolution of underwater imagery for improved visual perception ［EB/OL］. ［2024-10-14］. .
25	ISLAM M J， XIA Y， SATTAR J. Fast underwater image enhancement for improved visual perception［J］. IEEE Robotics and Automation Letters， 2020， 5（2）： 3227-3234.
26	WANG Z， BOVIK A C， SHEIKH H R， et al. Image quality assessment： from error visibility to structural similarity［J］. IEEE Transactions on Image Processing， 2004， 13（4）： 600-612.
27	PANETTA K， GAO C， AGAIAN S. Human-visual-system-inspired underwater image quality measures［J］. IEEE Journal of Oceanic Engineering， 2016， 41（3）： 541-551.
28	YANG M， SOWMYA A. An underwater color image quality evaluation metric［J］. IEEE Transactions on Image Processing， 2015， 24（12）： 6062-6071.
29	LI C， ANWAR S， PORIKLI F. Underwater scene prior inspired deep underwater image and video enhancement［J］. Pattern Recognition， 2020， 98： No. 107038.
30	REN T， XU H， JIANG G， et al. Reinforced Swin-Convs Transformer for simultaneous underwater sensing scene image enhancement and super-resolution［J］. IEEE Transactions on Geoscience and Remote Sensing， 2022， 60： No.4209616.

[1]	Lihu PAN, Shouxin PENG, Rui ZHANG, Zhiyang XUE, Xuzhen MAO. Video anomaly detection for moving foreground regions [J]. Journal of Computer Applications, 2025, 45(4): 1300-1309.
[2]	Hong SHANGGUAN, Huiying REN, Xiong ZHANG, Xinglong HAN, Zhiguo GUI, Yanling WANG. Low-dose CT denoising model based on dual encoder-decoder generative adversarial network [J]. Journal of Computer Applications, 2025, 45(2): 624-632.
[3]	Li LIU, Haijin HOU, Anhong WANG, Tao ZHANG. Generative data hiding algorithm based on multi-scale attention [J]. Journal of Computer Applications, 2024, 44(7): 2102-2109.
[4]	Haoran WANG, Dan YU, Yuli YANG, Yao MA, Yongle CHEN. Domain transfer intrusion detection method for unknown attacks on industrial control systems [J]. Journal of Computer Applications, 2024, 44(4): 1158-1165.
[5]	Ning WU, Yangyang LUO, Huajie XU. Semantic segmentation method for remote sensing images based on multi-scale feature fusion [J]. Journal of Computer Applications, 2024, 44(3): 737-744.
[6]	Sunjie YU, Hui ZENG, Shiyu XIONG, Hongzhou SHI. Incentive mechanism for federated learning based on generative adversarial network [J]. Journal of Computer Applications, 2024, 44(2): 344-352.
[7]	Lin WANG, Jingliang LIU, Wuwei WANG. Small target detection method in UAV images based on fusion of dilated convolution and Transformer [J]. Journal of Computer Applications, 2024, 44(11): 3595-3602.
[8]	Liqing QIU, Xiaopan SU. Personalized multi-layer interest extraction click-through rate prediction model [J]. Journal of Computer Applications, 2024, 44(11): 3411-3418.
[9]	Hui ZHOU, Yuling CHEN, Xuewei WANG, Yangwen ZHANG, Jianjiang HE. Deep shadow defense scheme of federated learning based on generative adversarial network [J]. Journal of Computer Applications, 2024, 44(1): 223-232.
[10]	Jia CHEN, Hong ZHANG. Image text retrieval method based on feature enhancement and semantic correlation matching [J]. Journal of Computer Applications, 2024, 44(1): 16-23.
[11]	Anyang LIU, Huaici ZHAO, Wenlong CAI, Zechao XU, Ruideng XIE. Adaptive image deblurring generative adversarial network algorithm based on active discrimination mechanism [J]. Journal of Computer Applications, 2023, 43(7): 2288-2294.
[12]	Shaoquan CHEN, Jianping CAI, Lan SUN. Differential privacy generative adversarial network algorithm with dynamic gradient threshold clipping [J]. Journal of Computer Applications, 2023, 43(7): 2065-2072.
[13]	Xin JIN, Yangchuan LIU, Yechen ZHU, Zijian ZHANG, Xin GAO. Sinogram inpainting for sparse-view cone-beam computed tomography image reconstruction based on residual encoder-decoder generative adversarial network [J]. Journal of Computer Applications, 2023, 43(6): 1950-1957.
[14]	Jinwen GUO, Xinghua MA, Gongning LUO, Wei WANG, Yang CAO, Kuanquan WANG. Guidewire artifact removal method of structure-enhanced IVOCT based on Transformer [J]. Journal of Computer Applications, 2023, 43(5): 1596-1605.
[15]	Jiagao WU, Shiwen ZHANG, Yudong JIANG, Linfeng LIU. Social-interaction GAN for pedestrian trajectory prediction based on state-refinement long short-term memory and attention mechanism [J]. Journal of Computer Applications, 2023, 43(5): 1565-1570.

Generative adversarial network underwater image enhancement model based on Swin Transformer

基于Swin Transformer的生成对抗网络水下图像增强模型

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 9

References 30

Related Articles 15

Recommended Articles

Metrics