Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (5): 1588-1596.DOI: 10.11772/j.issn.1001-9081.2023050636

Special Issue: 多媒体计算与计算机仿真

• Multimedia computing and computer simulation • Previous Articles     Next Articles

Image super-resolution network based on global dependency Transformer

Zihan LIU(), Dengwen ZHOU, Yukai LIU   

  1. School of Control and Computer Engineering,North China Electric Power University,Beijing 102206,China
  • Received:2023-05-23 Revised:2023-08-31 Accepted:2023-09-13 Online:2023-09-19 Published:2024-05-10
  • Contact: Zihan LIU
  • About author:ZHOU Dengwen, born in 1965, M. S., professor. His research interests include image denoising, image super-resolution.
    LIU Yukai, born in 1996, M. S. candidate. His research interests include deep learning, computer vision.

基于全局依赖Transformer的图像超分辨率网络

刘子涵(), 周登文, 刘玉铠   

  1. 华北电力大学 控制与计算机工程学院,北京 102206
  • 通讯作者: 刘子涵
  • 作者简介:周登文(1965—),男,湖北黄梅人,教授,硕士,主要研究方向:图像去噪、图像超分辨率
    刘玉铠(1996—),男,河北衡水人,硕士研究生,主要研究方向:深度学习、计算机视觉。
    第一联系人:刘子涵(1997—),男,河北石家庄人,硕士研究生,主要研究方向:深度学习、计算机视觉

Abstract:

At present, the image super-resolution networks based on deep learning are mainly implemented by convolution. Compared with the traditional Convolutional Neural Network (CNN), the main advantage of Transformer in the image super-resolution task is its long-distance dependency modeling ability. However, most Transformer-based image super-resolution models cannot establish global dependencies with small parameters and few network layers, which limits the performance of the model. In order to establish global dependencies in super-resolution network, an image Super-Resolution network based on Global Dependency Transformer (GDTSR) was proposed. Its main component was the Residual Square Axial Window Block (RSAWB), and in Transformer residual layer, axial window and self-attention were used to make each pixel globally dependent on the entire feature map. In addition, the super-resolution image reconstruction modules of most current image super-resolution models are composed of convolutions. In order to dynamically integrate the extracted feature information, Transformer and convolution were combined to jointly reconstruct super-resolution images. Experimental results show that the Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM) of GDTSR on five standard test sets, including Set5, Set14, B100, Urban100 and Manga109, are optimal for three multiples (×2×3×4), and on large-scale datasets Urban100 and Manga109, the performance improvement is especially obvious.

Key words: image super-resolution, Transformer, self-attention, global dependency, axial window

摘要:

目前,基于深度学习的图像超分辨网络主要由卷积实现。相较于传统的卷积神经网络(CNN),Transformer在图像超分辨率任务中的主要优势是它的长距离依赖建模能力;然而大多数基于Transformer的图像超分辨率模型在参数量小、网络层数少的情况下无法建立全局依赖,限制了模型的性能。为了在超分辨率网络中建立全局依赖,提出了基于全局依赖Transformer的图像超分辨率网络(GDTSR),主要组成部分为残差方形轴向窗口块(RSAWB),它的内部轴向窗口Transformer残差层利用轴向窗口和自注意力,可以使每个像素与整个特征图建立起全局依赖。此外,目前大多数图像超分辨率模型的超分辨率图像重建模块都由卷积组成,为了动态整合提取到的特征信息,结合Transformer与卷积,共同重建超分辨率图像。实验结果表明,GDTSR在5个标准测试集Set5、Set14、B100、Urban100和Manga109上的测试结果中,3个倍数(×2×3×4)的峰值信噪比(PSNR)和结构相似性(SSIM)均达到了最优,特别是在大尺寸图像的Urban100和Manga109数据集上模型性能的提升尤为明显。

关键词: 图像超分辨率, Transformer, 自注意力, 全局依赖, 轴向窗口

CLC Number: