基于Swin Transformer的生成对抗网络水下图像增强模型

doi:10.11772/j.issn.1001-9081.2024050730

《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (5): 1439-1446.DOI: 10.11772/j.issn.1001-9081.2024050730

• 2024年中国粒计算与知识发现学术会议 • 上一篇

基于Swin Transformer的生成对抗网络水下图像增强模型

李慧, 贾炳志, 王晨曦, 董子宇, 李纪龙, 仲兆满, 陈艳艳()

江苏海洋大学计算机工程学院，江苏连云港 222005

收稿日期:2024-06-03 修回日期:2024-07-12 接受日期:2024-07-18 发布日期:2024-08-12 出版日期:2025-05-10
通讯作者: 陈艳艳
作者简介:李慧（1979—），女，江苏连云港人，教授，博士，主要研究方向：图像处理、计算机视觉
贾炳志（1999—），男，山东枣庄人，硕士研究生，主要研究方向：图像增强、深度学习
王晨曦（2002—），男，江苏徐州人，主要研究方向：图像处理
董子宇（2004—），男，江苏连云港人，主要研究方向：人工智能
李纪龙（2000—），男，江西南昌人，硕士研究生，主要研究方向：人工智能
仲兆满（1977—），男，江苏连云港人，教授，博士，CCF会员，主要研究方向：人工智能、自然语言处理、大数据采集与分析、社交网络分析
陈艳艳（1972—），女，山东东明人，讲师，硕士，主要研究方向：人工智能、智能信息处理。
基金资助:
国家自然科学基金资助项目(72174079);连云港市第六期“521”项目(LYG06521202351);连云港市科技计划项目(CG2325)

Generative adversarial network underwater image enhancement model based on Swin Transformer

Hui LI, Bingzhi JIA, Chenxi WANG, Ziyu DONG, Jilong LI, Zhaoman ZHONG, Yanyan CHEN()

School of Computer Engineering，Jiangsu Ocean University，Lianyungang Jiangsu 222005，China

Received:2024-06-03 Revised:2024-07-12 Accepted:2024-07-18 Online:2024-08-12 Published:2025-05-10
Contact: Yanyan CHEN
About author:LI Hui， born in 1979， Ph. D.， professor. Her research interests include image processing， computer vision.
JIA Bingzhi， born in 1999， M. S. candidate. His research interests include image enhancement， deep learning.
WANG Chenxi， born in 2002. His research interests include image processing.
DONG Ziyu， born in 2004. His research interests include artificial intelligence.
LI Jilong， born in 2000， M. S. candidate. His research interests include artificial intelligence.
ZHONG Zhaoman， born in 1977， Ph. D.， professor. His research interests include artificial intelligence， natural language processing， big data collection and analysis， social network analysis.
CHEN Yanyan， born in 1972， M. S.， lecturer. Her research interests include artificial intelligence，intelligent information processing.
Supported by:
National Natural Science Foundation of China project(72174079);The Sixth Phase of the “521” Program in Lianyungang City(LYG06521202351);Lianyungang Science and Technology Plan Program(CG2325)

摘要/Abstract

摘要：

针对水下图像对比度低、噪声大和存在色彩偏差等问题，以生成对抗网络（GAN）为核心框架，提出一种基于Swin Transformer的生成对抗网络水下图像增强模型SwinGAN（GAN based on Swin Transformer）。首先，生成网络部分遵循编码器-瓶颈层-解码器的结构设计，在瓶颈层将输入的特征图分割成多个不重叠的局部窗口；其次，引入双路窗口多头自注意力机制（DWMSA），在加强捕获全局信息和长距离依赖关系的同时，增强局部注意力；最后，在解码器中将下采样后的特征图经过多个上采样窗口重新组合成原始尺寸的特征图，判别网络则采用马尔可夫判别器。实验结果表明，与URSCT-SESR模型相比，在UFO-120数据集上，SwinGAN的峰值信噪比（PSNR）提升了0.837 2 dB，结构相似度（SSIM）提高了0.003 6；在EUVP-515数据集上，SwinGAN的PSNR提升了0.843 9 dB，SSIM提高了0.005 1，水下图像质量评价指标（UIQM）增加了0.112 4，水下彩色图像质量评估指标（UCIQE）略有上升，增加了0.001 0。可见，SwinGAN的主观评价以及客观评价指标都表现出色，在改善水下图像的色彩偏差问题上取得了不错的效果。

关键词: 水下图像增强, Swin Transformer, 生成对抗网络, 多头自注意力机制, 马尔可夫判别器

Abstract:

Aiming at the problems of low contrast， heavy noise and color deviation in underwater images， using Generative Adversarial Network （GAN） model as the core framework， a new underwater image enhancement model was proposed based on GAN， namely SwinGAN （GAN based on Swin Transformer）. Firstly， the generative network was designed according to the encoder-bottleneck-decoder structure， where the input feature maps were divided into multiple non-overlapping local windows at the bottleneck layer. Secondly， a Dual-path Window Multi-head Self-Attention mechanism（DWMSA） was introduced to enhance local attention while simultaneously capturing global information and long-range dependencies. Finally， the decoder recombined the multiple windows back into the original size feature maps， and the discriminator network employed a Markov discriminator. Compared to the URSCT-SESR model， SwinGAN model shows an improvement of 0.837 2 dB in Peak Signal-to-Noise Ratio （PSNR） and 0.003 6 in Structural SIMilarity index （SSIM） on the UFO-120 dataset. On the EUVP-515 dataset， SwinGAN model achieves more significant improvement， with a 0.843 9 dB boost in PSNR， an increase of 0.005 1 in SSIM， an enhancement of 0.112 4 in Underwater Image Quality Measure （UIQM）， and a slight increase of 0.001 0 in Underwater Color Image Quality Evaluation （UCIQE）. Experimental results demonstrate that the SwinGAN model excels in both subjective and objective evaluation metrics， achieving notable improvements in correcting color deviation in underwater images.

Key words: underwater image enhancement, Swin Transformer, Generative Adversarial Network (GAN), multi-head self-attention mechanism, Markov discriminator

中图分类号:

TP391.41

李慧, 贾炳志, 王晨曦, 董子宇, 李纪龙, 仲兆满, 陈艳艳. 基于Swin Transformer的生成对抗网络水下图像增强模型[J]. 计算机应用, 2025, 45(5): 1439-1446.

Hui LI, Bingzhi JIA, Chenxi WANG, Ziyu DONG, Jilong LI, Zhaoman ZHONG, Yanyan CHEN. Generative adversarial network underwater image enhancement model based on Swin Transformer[J]. Journal of Computer Applications, 2025, 45(5): 1439-1446.

图/表 9

图1 SwinGAN的生成网络

Fig. 1 Generation network of SwinGAN

图2 DWMSA的总体结构

Fig. 2 Overall structure of DWMSA

图3 DWMSA的细节示意图

Fig. 3 Schematic detail diagram of DWMSA

图4 判别网络

Fig. 4 Discriminator network

表1 损失函数的参数对比实验结果

Tab. 1 Parameter comparison experimental results for loss functions

实验序号	参数设置				评价指标
实验序号	$λ 1$	$λ 2$	$λ 3$	$λ 4$	PSNR/dB	SSIM	UIQM	UCIQE
1	0.1	0.2	0.1	0.6	29.927 1	0.851 4	2.859 6	0.586 9
2	0.1	0.2	0.2	0.5	29.890 2	0.855 9	2.796 0	0.586 5
3	0.1	0.2	0.3	0.4	30.028 0	0.849 1	2.841 5	0.582 4
4	0.1	0.2	0.4	0.3	29.903 0	0.859 2	2.903 9	0.583 3
5	0.1	0.2	0.5	0.2	29.923 1	0.856 3	2.844 8	0.585 0
6	0.1	0.2	0.6	0.1	30.023 2	0.858 1	2.843 2	0.583 0

表1 损失函数的参数对比实验结果

Tab. 1 Parameter comparison experimental results for loss functions

实验序号	参数设置				评价指标
实验序号	$λ 1$	$λ 2$	$λ 3$	$λ 4$	PSNR/dB	SSIM	UIQM	UCIQE
1	0.1	0.2	0.1	0.6	29.927 1	0.851 4	2.859 6	0.586 9
2	0.1	0.2	0.2	0.5	29.890 2	0.855 9	2.796 0	0.586 5
3	0.1	0.2	0.3	0.4	30.028 0	0.849 1	2.841 5	0.582 4
4	0.1	0.2	0.4	0.3	29.903 0	0.859 2	2.903 9	0.583 3
5	0.1	0.2	0.5	0.2	29.923 1	0.856 3	2.844 8	0.585 0
6	0.1	0.2	0.6	0.1	30.023 2	0.858 1	2.843 2	0.583 0

图5 不同模型在UFO-120数据集上的效果对比

Fig. 5 Comparison of effects of different models on UFO-120 dataset

图6 不同模型在EUVP数据集上的效果对比

Fig. 6 Comparison of effects of different models on EUVP dataset

表2 不同模型在UFO-120和EUVP-515数据集上的评价指标对比

Tab. 2 Evaluation index comparison of different models on UFO-120 and EUVP-515 datasets

模型	UFO-120				EUVP-515
模型	PSNR/dB	SSIM	UIQM	UCIQE	PSNR/dB	SSIM	UIQM	UCIQE
UWCNN	28.458 3	0.711 3	3.164 2	0.565 9	28.558 0	0.787 2	3.035 3	0.566 3
UGAN	30.039 2	0.819 9	2.615 2	0.598 3	30.270 5	0.876 9	2.484 2	0.585 8
FUnIE_GAN	30.237 9	0.826 2	2.440 8	0.598 1	30.216 1	0.865 4	2.437 7	0.592 1
Deep_SESR	30.585 7	0.859 2	3.106 4	0.594 0	30.591 9	0.870 4	3.075 7	0.585 4
URSCT-SESR	31.521 6	0.863 8	3.096 6	0.599 6	32.702 1	0.912 1	3.141 0	0.595 5
SwinGAN	32.358 8	0.867 4	3.169 2	0.598 4	33.546 0	0.917 2	3.253 4	0.596 5

表3 改进模块在EUVP-515数据集上的评价指标对比

Tab. 3 Evaluation index comparison of improved modules on EUVP-515 dataset

模型	W-MSA	FNN	双卷积	PSNR/dB	SSIM	UIQM	UCIQE
Model1	√			29.431	0.842	2.368	0.439
Model2	√	√		29.503	0.783	2.775	0.590
Model3		√	√	30.336	0.834	2.825	0.591
本文模型	√	√	√	33.546	0.917	3.253	0.596

参考文献 30

1	严浙平，曲思瑜，邢文.水下图像增强方法研究综述［J］.智能系统学报，2022，17（5）：860-873.
	YAN Z P， QU S Y， XING W. An overview of underwater image enhancement methods［J］. CAAI Transactions on Intelligent Systems，2022， 17（5）： 860-873.
2	DONG C， LOY C C， HE K， et al. Learning a deep convolutional network for image super-resolution ［C］// Proceedings of the 2014 European Conference on Computer Vision， LNCS 8692. Cham： Springer， 2014： 184-199.
3	张婷，赵杏，陈文欣. 基于条件生成对抗网络的图像去雾方法［J］. 计算机应用，2021，41（S2）：248-253.
	ZHANG T， ZHAO X， CHEN W X. Image dehazing method based on conditional generative adversarial network［J］. Journal of Computer Applications， 2021， 41（S2）： 248-253.
4	DOSOVISKIY A， BEYER L， KOLESNIKOV A， et al. An image is worth 16×16 words： Transformers for image recognition at scale ［EB/OL］. ［2021-06-03］. .
5	WANG Y， ZHANG J， CAO Y， et al. A deep CNN method for underwater image enhancement ［C］// Proceedings of the 2017 IEEE International Conference on Image Processing. Piscataway： IEEE， 2017： 1382-1386.
6	ZHENG M， LUO W. Underwater image enhancement using improved CNN based defogging［J］. Electronics， 2022， 11（1）： Article No. 150.
7	TANG Z， LI J， HUANG J， et al. Multi-scale convolution underwater image restoration network［J］. Machine Vision and Applications， 2022， 33（6）： Article No. 85.
8	LI J， SKINNER K A， EUSTICE R M， et al. WaterGAN： unsupervised generative network to enable real-time color correction of monocular underwater images［J］. IEEE Robotics and Automation Letters， 2018， 3（1）： 387-394.
9	FABBRI C， ISLAM M J， SATTAR J. Enhancing underwater imagery using generative adversarial networks［C］// Proceedings of the 2018 IEEE International Conference on Robotics and Automation. Piscataway： IEEE， 2018： 7159-7165.
10	WANG N， ZHOU Y， HAN F， et al. UWGAN： underwater GAN for real-world underwater color restoration and dehazing ［EB/OL］. ［2021-03-26］. .
11	CONG R， YANG W， ZHANG W， et al. PUGAN： physical model-guided underwater image enhancement using GAN with dual-discriminators［J］. IEEE Transactions on Image Processing， 2023， 32： 4472-4485.
12	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2017： 6000-6010.
13	LIU Z， LIN Y， CAO Y， et al. Swin Transformer： hierarchical vision transformer using shifted windows［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 9992-10002.
14	PENG L， ZHU C， BIAN L. U-Shape Transformer for underwater image enhancement［J］. IEEE Transactions on Image Processing， 2023， 32： 3066-3079.
15	CHENG N， SUN Z， ZHU X， et al. A transformer-based network for perceptual contrastive underwater image enhancement［J］. Signal Processing： Image Communication， 2023， 118： Article No. 117032.
16	FAN C M， LIU T J， LIU K H. SUNet： Swin Transformer UNet for image denoising［C］// Proceedings of the 2022 IEEE International Symposium on Circuits and Systems. Piscataway： IEEE， 2022： 2333-2337.
17	YOU D， GAO X， KATAYAMA S. WPD-PCA-based laser welding process monitoring and defects diagnosis by using FNN and SVM［J］. IEEE Transactions on Industrial Electronics， 2015， 62（1）： 628-636.
18	MAO X， LI Q， XIE H， et al. Least squares generative adversarial networks［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2813-2821.
19	LAI W S， HUANG J B， AHUJA N， et al. Deep Laplacian pyramid networks for fast and accurate super-resolution［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 5835-5843.
20	GOODFELLOW I， POUGET-ABADIE J， MIRZA M， et al. Generative adversarial networks［J］. Communications of the ACM， 2020， 63（11）： 139-144.
21	SEIF G， ANDROUTSOS D. Edge-based loss function for single image super-resolution［C］// Proceedings of the 2018 IEEE International Conference on Acoustics， Speech and Signal Processing. Piscataway： IEEE， 2018： 1468-1472.
22	WANG Z， SIMONCELLI E P， BOVIK A C. Multiscale structural similarity for image quality assessment［C］// Proceedings of the 37th Asilomar Conference on Signals， Systems and Computers — Volume 2. Piscataway： IEEE， 2003： 1398-1402.
23	ZHAO H， GALLO O， FROSIO I， et al. Loss functions for image restoration with neural networks［J］. IEEE Transactions on Computational Imaging， 2017， 3（1）： 47-57.
24	ISLAM M J， LUO P， SATTAR J. Simultaneous enhancement and super-resolution of underwater imagery for improved visual perception ［EB/OL］. ［2024-10-14］. .
25	ISLAM M J， XIA Y， SATTAR J. Fast underwater image enhancement for improved visual perception［J］. IEEE Robotics and Automation Letters， 2020， 5（2）： 3227-3234.
26	WANG Z， BOVIK A C， SHEIKH H R， et al. Image quality assessment： from error visibility to structural similarity［J］. IEEE Transactions on Image Processing， 2004， 13（4）： 600-612.
27	PANETTA K， GAO C， AGAIAN S. Human-visual-system-inspired underwater image quality measures［J］. IEEE Journal of Oceanic Engineering， 2016， 41（3）： 541-551.
28	YANG M， SOWMYA A. An underwater color image quality evaluation metric［J］. IEEE Transactions on Image Processing， 2015， 24（12）： 6062-6071.
29	LI C， ANWAR S， PORIKLI F. Underwater scene prior inspired deep underwater image and video enhancement［J］. Pattern Recognition， 2020， 98： No. 107038.
30	REN T， XU H， JIANG G， et al. Reinforced Swin-Convs Transformer for simultaneous underwater sensing scene image enhancement and super-resolution［J］. IEEE Transactions on Geoscience and Remote Sensing， 2022， 60： No.4209616.

[1]	潘理虎, 彭守信, 张睿, 薛之洋, 毛旭珍. 面向运动前景区域的视频异常检测[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1300-1309.
[2]	上官宏, 任慧莹, 张雄, 韩兴隆, 桂志国, 王燕玲. 基于双编码器双解码器GAN的低剂量CT降噪模型[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 624-632.
[3]	宋鹏程, 郭立君, 张荣. 利用局部-全局时间依赖的弱监督视频异常检测[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 240-246.
[4]	刘丽, 侯海金, 王安红, 张涛. 基于多尺度注意力的生成式信息隐藏算法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2102-2109.
[5]	王昊冉, 于丹, 杨玉丽, 马垚, 陈永乐. 面向工控系统未知攻击的域迁移入侵检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1158-1165.
[6]	郑毅, 廖存燚, 张天倩, 王骥, 刘守印. 面向城区的基于图去噪的小区级RSRP估计方法[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 855-862.
[7]	吴宁, 罗杨洋, 许华杰. 基于多尺度特征融合的遥感图像语义分割方法[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 737-744.
[8]	王林, 刘景亮, 王无为. 基于空洞卷积融合Transformer的无人机图像小目标检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3595-3602.
[9]	仇丽青, 苏小盼. 个性化多层兴趣提取点击率预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3411-3418.
[10]	周辉, 陈玉玲, 王学伟, 张洋文, 何建江. 基于生成对抗网络的联邦学习深度影子防御方案[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 223-232.
[11]	陈佳, 张鸿. 基于特征增强和语义相关性匹配的图像文本检索方法[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 16-23.
[12]	陈少权, 蔡剑平, 孙岚. 动态梯度阈值裁剪的差分隐私生成对抗网络算法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2065-2072.
[13]	刘安阳, 赵怀慈, 蔡文龙, 许泽超, 解瑞灯. 基于主动判别机制的自适应生成对抗网络图像去模糊算法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2288-2294.
[14]	靳鑫, 刘仰川, 朱叶晨, 张子健, 高欣. 基于残差编解码-生成对抗网络的正弦图修复的稀疏角度锥束CT图像重建[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1950-1957.
[15]	郭劲文, 马兴华, 骆功宁, 王玮, 曹阳, 王宽全. 基于Transformer的结构强化IVOCT导丝伪影去除方法[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1596-1605.

基于Swin Transformer的生成对抗网络水下图像增强模型

Generative adversarial network underwater image enhancement model based on Swin Transformer

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 9

参考文献 30

相关文章 15

编辑推荐

Metrics