基于金字塔分割注意力网络的单目深度估计方法

doi:10.11772/j.issn.1001-9081.2022060852

《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (6): 1736-1742.DOI: 10.11772/j.issn.1001-9081.2022060852

• CCF第37届中国计算机应用大会 (CCF NCCA 2022) • 上一篇下一篇

基于金字塔分割注意力网络的单目深度估计方法

李文举¹, 李梦颖¹, 崔柳¹, 储王慧¹, 张益¹, 高慧²()

^1.上海应用技术大学计算机科学与信息工程学院，上海 201418
^2.上海应用技术大学艺术与设计学院，上海 201418

收稿日期:2022-06-14 修回日期:2022-08-05 接受日期:2022-08-11 发布日期:2022-10-08 出版日期:2023-06-10
通讯作者: 高慧
作者简介:李文举（1964—），男，辽宁营口人，教授，博士，CCF高级会员，主要研究方向：计算机视觉、模式识别、智能检测
李梦颖（1996—），女，江苏宿迁人，硕士研究生，主要研究方向：同步定位与地图构建（SLAM）、单目图像深度估计
崔柳（1984—），女，辽宁锦州人，讲师，博士，主要研究方向：制导导航与控制、微传感器、生物医学信号处理
储王慧（1998—），女，安徽池州人，硕士研究生，主要研究方向：3D目标检测
张益（1997—），男，河南信阳人，硕士研究生，主要研究方向：点云处理；

Monocular depth estimation method based on pyramid split attention network

Wenju LI¹, Mengying LI¹, Liu CUI¹, Wanghui CHU¹, Yi ZHANG¹, Hui GAO²()

^1.School of Computer Science and Information Engineering，Shanghai Institute of Technology，Shanghai 201418，China
^2.School of Art and Design，Shanghai Institute of Technology，Shanghai 201418，China

Received:2022-06-14 Revised:2022-08-05 Accepted:2022-08-11 Online:2022-10-08 Published:2023-06-10
Contact: Hui GAO
About author:LI Wenju， born in 1964， Ph. D.， professor. His research interests include computer vision， pattern recognition， intelligent detection.
LI Mengying， born in 1996， M. S. candidate. Her research interests include Simultaneous Localization And Mapping （SLAM）， monocular image depth estimation.
CUI Liu， born in 1984， Ph. D.， lecturer. Her research interests include guidance， navigation and control， micro sensors， biomedical signal processing.
CHU Wanghui， born in 1998， M. S. candidate. Her research interests include 3D object detection.
ZHANG Yi， born in 1997， M. S. candidate. His research interests include point cloud processing.
Supported by:
National Natural Science Foundation of China(61903256)

摘要/Abstract

摘要：

针对目前单目图像在深度估计中依然存在边缘以及深度最大区域预测不准确的问题，提出了一种基于金字塔分割注意力网络的单目深度估计方法（PS-Net）。首先，PS-Net以边界引导和场景聚合网络（BS-Net）为基础，引入金字塔分割注意力（PSA）模块处理多尺度特征的空间信息并且有效建立多尺度通道注意力间的长期依赖关系，从而提取深度梯度变化剧烈的边界和深度最大的区域；然后，使用Mish函数作为解码器中的激活函数，以进一步提升网络的性能；最后，在NYUD v2（New York University Depth dataset v2）和iBims-1（independent Benchmark images and matched scans v1）数据集上进行训练评估。iBims-1数据集上的实验结果显示，所提网络在衡量定向深度误差（DDE）方面与BS-Net相比减小了1.42个百分点，正确预测深度像素的比例达到81.69%。以上表明所提网络在深度预测上具有较高的准确性。

关键词: 深度估计, 金字塔分割注意力, 三维场景, 深度特征, 监督学习

Abstract:

Aiming at the problem of inaccurate prediction of edges and the farthest region in monocular image depth estimation， a monocular depth estimation method based on Pyramid Split attention Network （PS-Net） was proposed. Firstly， based on Boundary-induced and Scene-aggregated Network （BS-Net）， Pyramid Split Attention （PSA） module was introduced in PS-Net to process the spatial information of multi-scale features and effectively establish the long-term dependence between multi-scale channel attentions， thereby extracting the boundary with sharp change depth gradient and the farthest region. Then， the Mish function was used as the activation function in the decoder to further improve the performance of the network. Finally， training and evaluation were performed on NYUD v2 （New York University Depth dataset v2） and iBims-1 （independent Benchmark images and matched scans v1） datasets. Experimental results on iBims-1 dataset show that the proposed network reduced 1.42 percentage points compared with BS-Net in measuring Directed Depth Error （DDE）， and has the proportion of correctly predicted depth pixels reached 81.69%. The above proves that the proposed network has high accuracy in depth prediction.

Key words: depth estimation, Pyramid Split Attention (PSA), Three-Dimensional (3D) scene, depth feature, supervised learning

中图分类号:

TP391.41

李文举, 李梦颖, 崔柳, 储王慧, 张益, 高慧. 基于金字塔分割注意力网络的单目深度估计方法[J]. 计算机应用, 2023, 43(6): 1736-1742.

Wenju LI, Mengying LI, Liu CUI, Wanghui CHU, Yi ZHANG, Hui GAO. Monocular depth estimation method based on pyramid split attention network[J]. Journal of Computer Applications, 2023, 43(6): 1736-1742.

图/表 15

图1 PS-Net深度估计网络结构

Fig. 1 Structure of PS-Net depth estimation network

图2 大核细化块

Fig. 2 Large-kernel refinement block

图3 PSA模块

Fig. 3 PSA module

图4 SPC模块

Fig. 4 SPC module

图5 SE权重模块

Fig. 5 SE weight module

图6 Mish函数图像

Fig. 6 Mish function image

表1 在iBims-1数据集上的平面性误差、深度边界误差和定向深度误差

Tab. 1 Planarity errors， depth boundary errors， directed depth errors on iBims-1 dataset

方法	平面性误差		深度边界误差/px		定向深度误差/%
方法	PE_plan↓/cm	PE_ori↓/（°）	DBE_acc↓	DBE_com↓	DDE_0↑	DDE_m↓	DDE_ p↓
文献［27］方法	6.97	28.56	5.07	7.83	70.10	29.46	0.43
文献［22］方法	6.46	19.13	6.19	9.17	81.02	17.01	1.97
文献［5］方法	3.45	43.44	2.98	4.96	82.27	16.38	1.34
文献［26］方法	6.67	16.52	2.15		84.96
文献［8］方法	4.33	27.89	2.17	5.33	80.89	17.70	1.42
本文方法	3.89	29.44	2.14	5.18	81.69	16.28	2.02

表2 在iBims-1和NYUD v2数据集上的相关深度误差和精度

Tab. 2 Relative depth errors and accuracies on iBims-1 and NYUD v2 datasets

数据集	方法	RMSE	REL	Log10	$t d$
数据集	方法	RMSE	REL	Log10	1.25	1.25²	1.25³
iBims-1	文献［27］方法	1.610	0.350	0.190	0.220	0.550	0.780
	文献［22］方法	1.200	0.260	0.130	0.500	0.780	0.910
	文献［5］方法	1.140	0.220	0.110	0.510	0.840	0.940
	文献［8］方法	1.160	0.230	0.120	0.510	0.830	0.930
	本文方法	1.140	0.230	0.110	0.520	0.840	0.940
NYUD v2	文献［28］方法	0.586	0.121	0.052	0.811	0.954	0.987
	文献［27］方法	0.624	0.156		0.776	0.953	0.989
	文献［22］方法	0.573	0.127	0.055	0.811	0.953	0.988
	文献［29］方法	0.572	0.139		0.815	0.963	0.991
	文献［30］方法	0.582	0.120	0.055	0.817	0.954	0.987
	文献［8］方法	0.559	0.126	0.055	0.843	0.965	0.991
	本文方法	0.558	0.126	0.054	0.843	0.968	0.991

表2 在iBims-1和NYUD v2数据集上的相关深度误差和精度

Tab. 2 Relative depth errors and accuracies on iBims-1 and NYUD v2 datasets

数据集	方法	RMSE	REL	Log10	$t d$
数据集	方法	RMSE	REL	Log10	1.25	1.25²	1.25³
iBims-1	文献［27］方法	1.610	0.350	0.190	0.220	0.550	0.780
	文献［22］方法	1.200	0.260	0.130	0.500	0.780	0.910
	文献［5］方法	1.140	0.220	0.110	0.510	0.840	0.940
	文献［8］方法	1.160	0.230	0.120	0.510	0.830	0.930
	本文方法	1.140	0.230	0.110	0.520	0.840	0.940
NYUD v2	文献［28］方法	0.586	0.121	0.052	0.811	0.954	0.987
	文献［27］方法	0.624	0.156		0.776	0.953	0.989
	文献［22］方法	0.573	0.127	0.055	0.811	0.953	0.988
	文献［29］方法	0.572	0.139		0.815	0.963	0.991
	文献［30］方法	0.582	0.120	0.055	0.817	0.954	0.987
	文献［8］方法	0.559	0.126	0.055	0.843	0.965	0.991
	本文方法	0.558	0.126	0.054	0.843	0.968	0.991

表3 在iBims-1和NYUD v2数据集上不同划分率下的深度最深区域的距离误差

Tab. 3 Distance errors of the farthest region under different partition rates on iBims-1 and NYUD v2 datasets

数据集	方法	m=6	m=12	m=24
iBims-1	文献［27］方法	0.193	0.201	0.213
	文献［22］方法	0.169	0.192	0.210
	文献［5］方法	0.170	0.190	0.203
	文献［8］方法	0.181	0.200	0.211
	本文方法	0.176	0.193	0.200
NYUD v2	文献［27］方法	0.157	0.173	0.180
	文献［22］方法	0.116	0.140	0.157
	文献［8］方法	0.110	0.124	0.138
	本文方法	0.104	0.128	0.134

表4 不同方法在NYUD v2数据集上的深度边界精度

Tab. 4 Depth boundary accuracies of different methods on NYUD v2 dataset

方法	阈值>0.25			阈值>0.50			阈值>1.00
方法	准确率	召回率	综合指标	准确率	召回率	综合指标	准确率	召回率	综合指标
文献［28］方法	0.516	0.400	0.436	0.600	0.366	0.439	0.794	0.407	0.525
文献［27］方法	0.577	0.626	0.591	0.531	0.509	0.506	0.617	0.489	0.533
文献［22］方法	0.489	0.435	0.454	0.536	0.422	0.463	0.670	0.479	0.548
文献［8］方法	0.639	0.502	0.556	0.663	0.504	0.565	0.756	0.537	0.620
本文方法	0.640	0.503	0.557	0.666	0.507	0.566	0.759	0.540	0.621

图7 复杂背景的深度预测效果

Fig. 7 Depth prediction effects of complex background

图8 大物体和小物体的深度预测效果

Fig. 8 Depth prediction effects of large objects and small objects

图9 走廊深度预测效果

Fig. 9 Depth prediction effects of corridor

表5 iBims-1数据集上消融实验的预测结果

Tab. 5 Prediction results of ablation experiments on iBims-1 dataset

变体	t_d			REL	RMS	Log10
变体	1.25	1.25²	1.25³	REL	RMS	Log10
Baseline	0.507	0.823	0.926	0.243	1.172	0.122
+PSA	0.512	0.826	0.930	0.239	1.170	0.120
+DCE+BUBF	0.518	0.831	0.931	0.235	1.162	0.120
+PSA+DCE+BUBF	0.523	0.836	0.941	0.235	1.147	0.117

表6 NYUD v2数据集上不同阈值下预测边界像素的精度

Tab. 6 Accuracies of predicted boundary pixels in depth maps under different thresholds on NYUD v2 dataset

阈值	方法	准确率	召回率	综合指标
>0.25	+PSA +DCE+BUBF +PSA+DCE+BUBF	0.644	0.483	0.546
		0.639	0.502	0.556
		0.640	0.503	0.557
>0.50	+PSA +DCE+BUBF +PSA+DCE+BUBF	0.667	0.488	0.558
		0.663	0.504	0.565
		0.666	0.507	0.566
>1.00	+PSA +DCE+BUBF +PSA+DCE+BUBF	0.764	0.525	0.614
		0.756	0.537	0.620
		0.759	0.540	0.621

参考文献 30

1	SNAVELY N， SEITZ S M， SZELISKI R. Skeletal graphs for efficient structure from motion［C］// Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2008： 1-8. 10.1109/cvpr.2008.4587678
2	ZHANG R， TSAI P S， CRYER J E， et al. Shape-from-shading： a survey［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 1999， 21（8）： 690-706. 10.1109/34.784284
3	毕天腾，刘越，翁冬冬，等. 基于监督学习的单幅图像深度估计综述［J］. 计算机辅助设计与图形学学报， 2018， 30（8）： 1383-1393. 10.3724/sp.j.1089.2018.16882
	BI T T， LIU Y， WENG D D， et al. Survey on supervised learning based depth estimation from a single image ［J］. Journal of Computer-Aided Design and Computer Graphics， 2018， 30（8）： 1383-1393. 10.3724/sp.j.1089.2018.16882
4	ROY A， TODOROVIC S. Monocular depth estimation using neural regression forest ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 5506-5514. 10.1109/cvpr.2016.594
5	CHEN X T， CHEN X J， ZHA Z J. Structure-aware residual pyramid network for monocular depth estimation［C］// Proceedings of the 28th International Joint Conference on Artificial Intelligence. California： ijcai.org， 2019： 694-700. 10.24963/ijcai.2019/98
6	HU J J， OZAY M， ZHANG Y， et al. Revisiting single image depth estimation： toward higher resolution maps with accurate object boundaries ［C］// Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision. Piscataway： IEEE， 2019： 1043-1051. 10.1109/wacv.2019.00116
7	GUIZILINI V， AMBRUŞ R， PILLAI S， et al. 3D packing for self-supervised monocular depth estimation［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 2482-2491. 10.1109/cvpr42600.2020.00256
8	XUE F， CAO J F， ZHOU Y， et al. Boundary-induced and scene-aggregated network for monocular depth prediction［J］. Pattern Recognition， 2021， 115： No.107901. 10.1016/j.patcog.2021.107901
9	MISRA D. Mish： a self regularized non-monotonic activation function［C］// Proceedings of the 2020 British Machine Vision Conference. Durham： BMVA Press， 2020： No.928.
10	MALIK J， ROSENHOLTZ R. Computing local surface orientation and shape from texture for curved surfaces［J］. International Journal of Computer Vision， 1997， 23（2）： 149-168. 10.1023/a:1007958829620
11	SAXENA A， SCHULTE J， NG A Y. Depth estimation using monocular and stereo cues［C］// Proceedings of the 20th International Joint Conference on Artificial Intelligence. Menlo Park， CA： AAAI Press， 2007： 2197-2203.
12	LIU B Y， GOULD S， KOLLER D. Single image depth estimation from predicted semantic labels ［C］// Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2010： 1253-1260. 10.1109/cvpr.2010.5539823
13	李阳，陈秀万，王媛，等. 基于深度学习的单目图像深度估计的研究进展［J］. 激光与光电子学进展， 2019， 56（19）： No.190001. 10.3788/lop56.190001
	LI Y， CHEN X W， WANG Y， et al. Progress in deep learning based monocular image depth estimation ［J］. Lasers and Optoelectronics Progress， 2019， 56（19）： No.190001. 10.3788/lop56.190001
14	EIGEN D， PUHRSCH C， FERGUS R. Depth map prediction from a single image using a multi-scale deep network ［C］// Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge： MIT Press， 2014， 2： 2366-2374.
15	EIGEN D， FERGUS R. Predicting depth， surface normals and semantic labels with a common multi-scale convolutional architecture［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015： 2650-2658. 10.1109/iccv.2015.304
16	LIU F Y， SHEN C H， LIN G S. Deep convolutional neural fields for depth estimation from a single image ［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 5162-5170. 10.1109/cvpr.2015.7299152
17	LIU F Y， SHEN C H， LIN G S， et al. Learning depth from single monocular images using deep convolutional neural fields ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2016， 38（10）： 2024-2039. 10.1109/tpami.2015.2505283
18	LI B， SHEN C H， DAI Y C， et al. Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs ［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 1119-1127. 10.1109/cvpr.2015.7298715
19	ALI U， BAYRAMLI B， ALSARHAN T， et al. A lightweight network for monocular depth estimation with decoupled body and edge supervision ［J］. Image and Vision Computing， 2021， 113： No.104261. 10.1016/j.imavis.2021.104261
20	GARG R， VIJAY KUMAR B G， CARNEIRO G， et al. Unsupervised CNN for single view depth estimation： geometry to the rescue ［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9912. Cham： Springer， 2016： 740-756.
21	GODARD C， AODHA O MAC， BROSTOW G J. Unsupervised monocular depth estimation with left-right consistency［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 6602-6611. 10.1109/cvpr.2017.699
22	LAINA I， RUPPRECHT C， BELAGIANNIS V， et al. Deeper depth prediction with fully convolutional residual networks［C］// Proceedings of the 4th International Conference on 3D Vision. Piscataway： IEEE， 2016： 239-248. 10.1109/3dv.2016.32
23	ZHAN H， ZU K K， LU J， et al. EPSANet： an efficient pyramid split attention block on convolutional neural network［C］// Proceedings of the 2022 Asian Conference on Computer Vision， LNCS 13843. Cham： Springer， 2023： 541-557.
24	上海应用技术大学. 一种基于金字塔分割注意力的深度估计方法及装置： 202210186323.9［P］. 2022-05-31.
	Shanghai Institute of Technology. A depth estimation method and device based on pyramid split attention： 202210186323.9［P］. 2022-05-31.
25	KOCH T， LIEBEL L， FRAUNDORFER F， et al. Evaluation of CNN-based single-image depth estimation methods［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11131. Cham： Springer， 2019： 331-348.
26	SWAMI K， BONDADA P V， BAJPAI P K. ACED： accurate and edge-consistent monocular depth estimation［C］// Proceedings of the 2020 IEEE International Conference on Image Processing. Piscataway： IEEE， 2020： 1376-1380. 10.1109/icip40778.2020.9191113
27	DHARMASIRI T， SPEK A， DRUMMOND T. Joint prediction of depths， normals and surface curvature from RGB images using CNNs［C］// Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway： IEEE， 2017： 1505-1512. 10.1109/iros.2017.8205954
28	XU D， RICCI E， OUYANG W L， et al. Multi-scale continuous CRFs as sequential deep networks for monocular depth estimation［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 161-169. 10.1109/cvpr.2017.25
29	LEE J H， HEO M， KIM K R， et al. Single-image depth estimation based on Fourier domain analysis ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 330-339. 10.1109/cvpr.2018.00042
30	XU D， OUYANG W L， WANG X G， et al. PAD-Net： multi-tasks guided prediction-and-distillation network for simultaneous depth estimation and scene parsing ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 675-684. 10.1109/cvpr.2018.00077

[1]	许喆, 王志宏, 单存宇, 孙亚茹, 杨莹. 基于重构误差的无监督人脸伪造视频检测[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1571-1577.
[2]	葛孟婷, 万鸣华. 基于近邻监督局部不变鲁棒主成分分析的特征提取模型[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1013-1020.
[3]	夏进, 王正群, 朱世明. 基于时间序列分解的交通流量预测模型[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1129-1135.
[4]	张江峰, 闫涛, 陈斌, 钱宇华, 宋艳涛. 全局时空特征耦合的多景深三维形貌重建[J]. 《计算机应用》唯一官方网站, 2023, 43(3): 894-902.
[5]	张秋余, 王煜坤. 基于改进Inception网络的语音分类模型[J]. 《计算机应用》唯一官方网站, 2023, 43(3): 909-915.
[6]	伏博毅, 彭云聪, 蓝鑫, 秦小林. 基于深度学习的标签噪声学习算法综述[J]. 《计算机应用》唯一官方网站, 2023, 43(3): 674-684.
[7]	方昕, 黄泽鑫, 张聿晗, 高天, 潘嘉, 付中华, 高建清, 刘俊华, 邹亮. 基于时域波形的半监督端到端虚假语音检测方法[J]. 《计算机应用》唯一官方网站, 2023, 43(1): 227-231.
[8]	张文涛, 王园宇, 李赛泽. 基于条件对抗网络的单幅霾图像深度估计模型[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2865-2875.
[9]	郭一阳, 于炯, 杜旭升, 杨少智, 曹铭. 基于自编码器与集成学习的离群点检测算法[J]. 《计算机应用》唯一官方网站, 2022, 42(7): 2078-2087.
[10]	李锦烨, 黄瑞章, 秦永彬, 陈艳平, 田小瑜. 基于反绎学习的裁判文书量刑情节识别[J]. 《计算机应用》唯一官方网站, 2022, 42(6): 1802-1807.
[11]	邱永茹, 姚光乐, 冯杰, 崔昊宇. 基于半监督学习的单幅图像去雨算法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1577-1582.
[12]	高兵, 郑雅, 秦静, 邹启杰, 汪祖民. 基于麻雀搜索算法和改进粒子群优化算法的网络入侵检测算法[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1201-1206.
[13]	殷雨昌, 王洪元, 陈莉, 冯尊登, 肖宇. 基于单标注样本的多损失学习与联合度量视频行人重识别[J]. 《计算机应用》唯一官方网站, 2022, 42(3): 764-769.
[14]	柏财通, 崔翛龙, 郑会吉, 李爱. 基于自监督知识迁移的鲁棒性语音识别技术[J]. 《计算机应用》唯一官方网站, 2022, 42(10): 3217-3223.
[15]	罗萍, 丁玲, 杨雪, 向阳. 基于数据增强和弱监督对抗训练的中文事件检测[J]. 《计算机应用》唯一官方网站, 2022, 42(10): 2990-2995.

基于金字塔分割注意力网络的单目深度估计方法

Monocular depth estimation method based on pyramid split attention network

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 15

参考文献 30

相关文章 15

编辑推荐

Metrics