《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (9): 2865-2875.DOI: 10.11772/j.issn.1001-9081.2021081386

• 多媒体计算与计算机仿真 • 上一篇    

基于条件对抗网络的单幅霾图像深度估计模型

张文涛, 王园宇(), 李赛泽   

  1. 太原理工大学 信息与计算机学院,山西 晋中 030600
  • 收稿日期:2021-08-03 修回日期:2021-11-22 接受日期:2021-11-22 发布日期:2022-01-07 出版日期:2022-09-10
  • 通讯作者: 王园宇
  • 作者简介:张文涛(1995—),男,山西忻州人,硕士研究生,CCF会员,主要研究方向:计算机视觉、深度估计;
    李赛泽(1998—),男,山西运城人,硕士研究生,CCF会员,主要研究方向:计算机视觉、深度学习。
  • 基金资助:
    山西省自然科学基金资助项目(201801D121142);山西省回国留学人员科研资助项目

Depth estimation model of single haze image based on conditional generative adversarial network

Wentao ZHANG, Yuanyu WANG(), Saize LI   

  1. College of Information and Computer,Taiyuan University of Technology,Jinzhong Shanxi 030600,China
  • Received:2021-08-03 Revised:2021-11-22 Accepted:2021-11-22 Online:2022-01-07 Published:2022-09-10
  • Contact: Yuanyu WANG
  • About author:ZHANG Wentao, born in 1995, M. S. candidate. His research interests include computer vision, depth estimation.
    LI Saize, born in 1998, M. S. candidate. His research interests include computer vision, deep learning.
  • Supported by:
    Natural Science Foundation of Shanxi Province(201801D121142);Shanxi Scholarship Council of China

摘要:

针对霾环境中图像降质导致的传统深度估计模型退化问题,提出了一种融合双注意力机制的基于条件生成对抗网络(CGAN)的单幅霾图像深度估计模型。首先,对于模型的生成器的网络结构,提出了融合双注意力机制的DenseUnet结构,其中DenseUnet将密集块作为U-net编码和解码过程中的基本模块,并利用密集连接和跳跃连接在加强信息流动的同时,提取直接传输率图的底层结构特征和高级深度信息。然后,通过双注意力模块自适应地调整空间特征和通道特征的全局依赖关系,同时将最小绝对值损失、感知损失、梯度损失和对抗损失融合为新的结构保持损失函数。最后,将霾图像的直接传输率图作为CGAN的条件,通过生成器和鉴别器的对抗学习估计出霾图像的深度图。在室内数据集NYU Depth v2和室外数据集DIODE上进行训练和测试。实验结果表明,该模型具有更精细的几何结构和更丰富的局部细节。在NYU Depth v2上,与全卷积残差网络相比,对数平均误差(LME)和均方根误差(RMSE)分别降低了7%和10%;在DIODE上,与深度有序回归网络相比,精确度(阈值小于1.25)提高了7.6%。可见,所提模型提高了在霾干扰下深度估计的准确性和泛化能力

关键词: 深度估计, 霾图像, 注意力机制, 梯度损失, 条件生成对抗网络, 直接传输率图

Abstract:

To address the degradation problem of traditional depth estimation models caused by image quality degradation in haze environment, a model based on Conditional Generative Adversarial Network (CGAN) was proposed to estimate the depth of single haze image by fusing dual attention mechanism. Firstly, for the network structure of the generator of the model, the DenseUnet structure fused with dual attention mechanism was proposed. The dense blocks were used as basic blocks in the encoding and decoding processes of U-net. Dense and jump connections were used to enhance information flow, as well as extract the underlying structural features and high-level depth information of the direct transmission rate map. Then, the global dependencies of spatial features and channel features were adaptively adjusted by the dual attention module. At the same time, a new structure-preserving loss function was proposed by combining the least absolute value function, perceptual loss, gradient loss, and adversarial loss. Finally, using the direct transmission rate map of the haze image as a condition of CGAN, the depth map of the haze image was estimated through the adversarial learning of the generator and the discriminator. Training and testing were performed on the indoor dataset NYU Depth v2 and the outdoor dataset DIODE. Experimental results show that the proposed model has a finer geometric structure and richer local details. Compared with the fully convolutional residual network, on NYU Depth v2, the proposed model has the Logarithmic Mean Error (LME) and Root Mean Square Error (RMSE) error reduced by 7% and 10%, respectively. Compared with the deep ordinal regression network, on DIODE, the proposed model has the accuracy with threshold less than 1.25 increased by 7.6%. It can be seen that the proposed model improves the estimation accuracy and generalization ability of depth estimation under the interference of haze.

Key words: depth estimation, haze image, attention mechanism, gradient loss, Conditional Generative Adversarial Network (CGAN), direct transmission rate map

中图分类号: