《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (6): 1736-1742.DOI: 10.11772/j.issn.1001-9081.2022060852

• CCF第37届中国计算机应用大会 (CCF NCCA 2022) • 上一篇    下一篇

基于金字塔分割注意力网络的单目深度估计方法

李文举1, 李梦颖1, 崔柳1, 储王慧1, 张益1, 高慧2()   

  1. 1.上海应用技术大学 计算机科学与信息工程学院,上海 201418
    2.上海应用技术大学 艺术与设计学院,上海 201418
  • 收稿日期:2022-06-14 修回日期:2022-08-05 接受日期:2022-08-11 发布日期:2022-10-08 出版日期:2023-06-10
  • 通讯作者: 高慧
  • 作者简介:李文举(1964—),男,辽宁营口人,教授,博士,CCF高级会员,主要研究方向:计算机视觉、模式识别、智能检测
    李梦颖(1996—),女,江苏宿迁人,硕士研究生,主要研究方向:同步定位与地图构建(SLAM)、单目图像深度估计
    崔柳(1984—),女,辽宁锦州人,讲师,博士,主要研究方向:制导导航与控制、微传感器、生物医学信号处理
    储王慧(1998—),女,安徽池州人,硕士研究生,主要研究方向:3D目标检测
    张益(1997—),男,河南信阳人,硕士研究生,主要研究方向:点云处理;

Monocular depth estimation method based on pyramid split attention network

Wenju LI1, Mengying LI1, Liu CUI1, Wanghui CHU1, Yi ZHANG1, Hui GAO2()   

  1. 1.School of Computer Science and Information Engineering,Shanghai Institute of Technology,Shanghai 201418,China
    2.School of Art and Design,Shanghai Institute of Technology,Shanghai 201418,China
  • Received:2022-06-14 Revised:2022-08-05 Accepted:2022-08-11 Online:2022-10-08 Published:2023-06-10
  • Contact: Hui GAO
  • About author:LI Wenju, born in 1964, Ph. D., professor. His research interests include computer vision, pattern recognition, intelligent detection.
    LI Mengying, born in 1996, M. S. candidate. Her research interests include Simultaneous Localization And Mapping (SLAM), monocular image depth estimation.
    CUI Liu, born in 1984, Ph. D., lecturer. Her research interests include guidance, navigation and control, micro sensors, biomedical signal processing.
    CHU Wanghui, born in 1998, M. S. candidate. Her research interests include 3D object detection.
    ZHANG Yi, born in 1997, M. S. candidate. His research interests include point cloud processing.
  • Supported by:
    National Natural Science Foundation of China(61903256)

摘要:

针对目前单目图像在深度估计中依然存在边缘以及深度最大区域预测不准确的问题,提出了一种基于金字塔分割注意力网络的单目深度估计方法(PS-Net)。首先,PS-Net以边界引导和场景聚合网络(BS-Net)为基础,引入金字塔分割注意力(PSA)模块处理多尺度特征的空间信息并且有效建立多尺度通道注意力间的长期依赖关系,从而提取深度梯度变化剧烈的边界和深度最大的区域;然后,使用Mish函数作为解码器中的激活函数,以进一步提升网络的性能;最后,在NYUD v2(New York University Depth dataset v2)和iBims-1(independent Benchmark images and matched scans v1)数据集上进行训练评估。iBims-1数据集上的实验结果显示,所提网络在衡量定向深度误差(DDE)方面与BS-Net相比减小了1.42个百分点,正确预测深度像素的比例达到81.69%。以上表明所提网络在深度预测上具有较高的准确性。

关键词: 深度估计, 金字塔分割注意力, 三维场景, 深度特征, 监督学习

Abstract:

Aiming at the problem of inaccurate prediction of edges and the farthest region in monocular image depth estimation, a monocular depth estimation method based on Pyramid Split attention Network (PS-Net) was proposed. Firstly, based on Boundary-induced and Scene-aggregated Network (BS-Net), Pyramid Split Attention (PSA) module was introduced in PS-Net to process the spatial information of multi-scale features and effectively establish the long-term dependence between multi-scale channel attentions, thereby extracting the boundary with sharp change depth gradient and the farthest region. Then, the Mish function was used as the activation function in the decoder to further improve the performance of the network. Finally, training and evaluation were performed on NYUD v2 (New York University Depth dataset v2) and iBims-1 (independent Benchmark images and matched scans v1) datasets. Experimental results on iBims-1 dataset show that the proposed network reduced 1.42 percentage points compared with BS-Net in measuring Directed Depth Error (DDE), and has the proportion of correctly predicted depth pixels reached 81.69%. The above proves that the proposed network has high accuracy in depth prediction.

Key words: depth estimation, Pyramid Split Attention (PSA), Three-Dimensional (3D) scene, depth feature, supervised learning

中图分类号: