《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (5): 1439-1446.DOI: 10.11772/j.issn.1001-9081.2024050730

• 2024年中国粒计算与知识发现学术会议 • 上一篇    

基于Swin Transformer的生成对抗网络水下图像增强模型

李慧, 贾炳志, 王晨曦, 董子宇, 李纪龙, 仲兆满, 陈艳艳()   

  1. 江苏海洋大学 计算机工程学院,江苏 连云港 222005
  • 收稿日期:2024-06-03 修回日期:2024-07-12 接受日期:2024-07-18 发布日期:2024-08-12 出版日期:2025-05-10
  • 通讯作者: 陈艳艳
  • 作者简介:李慧(1979—),女,江苏连云港人,教授,博士,主要研究方向:图像处理、计算机视觉
    贾炳志(1999—),男,山东枣庄人,硕士研究生,主要研究方向:图像增强、深度学习
    王晨曦(2002—),男,江苏徐州人,主要研究方向:图像处理
    董子宇(2004—),男,江苏连云港人,主要研究方向:人工智能
    李纪龙 (2000—),男,江西南昌人,硕士研究生,主要研究方向:人工智能
    仲兆满(1977—),男,江苏连云港人,教授,博士,CCF会员,主要研究方向:人工智能、自然语言处理、大数据采集与分析、社交网络分析
    陈艳艳(1972—),女,山东东明人,讲师,硕士,主要研究方向:人工智能、智能信息处理。
  • 基金资助:
    国家自然科学基金资助项目(72174079);连云港市第六期“521”项目(LYG06521202351);连云港市科技计划项目(CG2325)

Generative adversarial network underwater image enhancement model based on Swin Transformer

Hui LI, Bingzhi JIA, Chenxi WANG, Ziyu DONG, Jilong LI, Zhaoman ZHONG, Yanyan CHEN()   

  1. School of Computer Engineering,Jiangsu Ocean University,Lianyungang Jiangsu 222005,China
  • Received:2024-06-03 Revised:2024-07-12 Accepted:2024-07-18 Online:2024-08-12 Published:2025-05-10
  • Contact: Yanyan CHEN
  • About author:LI Hui, born in 1979, Ph. D., professor. Her research interests include image processing, computer vision.
    JIA Bingzhi, born in 1999, M. S. candidate. His research interests include image enhancement, deep learning.
    WANG Chenxi, born in 2002. His research interests include image processing.
    DONG Ziyu, born in 2004. His research interests include artificial intelligence.
    LI Jilong, born in 2000, M. S. candidate. His research interests include artificial intelligence.
    ZHONG Zhaoman, born in 1977, Ph. D., professor. His research interests include artificial intelligence, natural language processing, big data collection and analysis, social network analysis.
    CHEN Yanyan, born in 1972, M. S., lecturer. Her research interests include artificial intelligence,intelligent information processing.
  • Supported by:
    National Natural Science Foundation of China project(72174079);The Sixth Phase of the “521” Program in Lianyungang City(LYG06521202351);Lianyungang Science and Technology Plan Program(CG2325)

摘要:

针对水下图像对比度低、噪声大和存在色彩偏差等问题,以生成对抗网络(GAN)为核心框架,提出一种基于Swin Transformer的生成对抗网络水下图像增强模型SwinGAN(GAN based on Swin Transformer)。首先,生成网络部分遵循编码器-瓶颈层-解码器的结构设计,在瓶颈层将输入的特征图分割成多个不重叠的局部窗口;其次,引入双路窗口多头自注意力机制(DWMSA),在加强捕获全局信息和长距离依赖关系的同时,增强局部注意力;最后,在解码器中将下采样后的特征图经过多个上采样窗口重新组合成原始尺寸的特征图,判别网络则采用马尔可夫判别器。实验结果表明,与URSCT-SESR模型相比,在UFO-120数据集上,SwinGAN的峰值信噪比(PSNR)提升了0.837 2 dB,结构相似度(SSIM)提高了0.003 6;在EUVP-515数据集上,SwinGAN的PSNR提升了0.843 9 dB,SSIM提高了0.005 1,水下图像质量评价指标(UIQM)增加了0.112 4,水下彩色图像质量评估指标(UCIQE)略有上升,增加了0.001 0。可见,SwinGAN的主观评价以及客观评价指标都表现出色,在改善水下图像的色彩偏差问题上取得了不错的效果。

关键词: 水下图像增强, Swin Transformer, 生成对抗网络, 多头自注意力机制, 马尔可夫判别器

Abstract:

Aiming at the problems of low contrast, heavy noise and color deviation in underwater images, using Generative Adversarial Network (GAN) model as the core framework, a new underwater image enhancement model was proposed based on GAN, namely SwinGAN (GAN based on Swin Transformer). Firstly, the generative network was designed according to the encoder-bottleneck-decoder structure, where the input feature maps were divided into multiple non-overlapping local windows at the bottleneck layer. Secondly, a Dual-path Window Multi-head Self-Attention mechanism(DWMSA) was introduced to enhance local attention while simultaneously capturing global information and long-range dependencies. Finally, the decoder recombined the multiple windows back into the original size feature maps, and the discriminator network employed a Markov discriminator. Compared to the URSCT-SESR model, SwinGAN model shows an improvement of 0.837 2 dB in Peak Signal-to-Noise Ratio (PSNR) and 0.003 6 in Structural SIMilarity index (SSIM) on the UFO-120 dataset. On the EUVP-515 dataset, SwinGAN model achieves more significant improvement, with a 0.843 9 dB boost in PSNR, an increase of 0.005 1 in SSIM, an enhancement of 0.112 4 in Underwater Image Quality Measure (UIQM), and a slight increase of 0.001 0 in Underwater Color Image Quality Evaluation (UCIQE). Experimental results demonstrate that the SwinGAN model excels in both subjective and objective evaluation metrics, achieving notable improvements in correcting color deviation in underwater images.

Key words: underwater image enhancement, Swin Transformer, Generative Adversarial Network (GAN), multi-head self-attention mechanism, Markov discriminator

中图分类号: