Focal stack depth estimation method based on defocus blur

doi:10.11772/j.issn.1001-9081.2022091342

Abstract

Abstract:

The existing monocular depth estimation methods often use image semantic information to obtain depth， and ignore another important cue — defocus blur. At the same time， the defocus blur based depth estimation methods usually take the focal stack or gradient information as input， and do not consider the characteristics of the small variation of blur between image layers of the focal stack and the blur ambiguity on both sides of the focal plane. Aiming at the deficiencies of the existing focal stack depth estimation methods， a lightweight network based on three-dimensional convolution was proposed. Firstly， a Three-Dimensional perception module was designed to roughly extract the blur information of the focal stack. Secondly， the extracted information was concatenated with the difference features of the focal stack RGB channels output by a channel difference module to construct a focus volume that was able to identify the blur ambiguity patterns. Finally， a multi-scale three-dimensional convolution was used to predict the depth. Experimental results show that compared with methods such as All in Focus Depth Network （AiFDepthNet）， the proposed method achieves the best on seven indicators such as Mean Absolute Error （MAE） on DefocusNet dataset， and the best on four indicators as well as the suboptimal on three indicators on NYU Depth V2 dataset； at the same time， the lightweight design reduces the inference time of the proposed method by 43.92% to 70.20% and 47.91% to 77.01% on two datasets respectively. The above verifies that the proposed method can effectively improve the accuracy and inference speed of focal stack depth estimation.

Key words: monocular depth estimation, focal stack, defocus blur, focus volume, blur ambiguity

摘要：

现有的单目深度估计方法通常使用图像语义信息来获取深度，忽略了另一个重要的线索——失焦模糊。同时，基于失焦模糊的深度估计方法通常把焦点堆栈或者梯度信息作为输入，没有考虑到焦点堆栈各图像层之间的模糊变化量小以及焦点平面两侧具有模糊歧义性的特点。针对现有焦点堆栈深度估计方法的不足，提出一种基于三维卷积的轻量化网络。首先，设计一个三维感知模块对焦点堆栈的模糊信息进行粗提取；然后，将提取到的信息与通道差分模块输出的焦点堆栈RGB通道差分特征进行级联，构建可以识别模糊歧义性模式的焦点体；最后，利用多尺度三维卷积来预测深度。实验结果表明，与AiFDepthNet（All in Focus Depth Network）等方法相比，所提方法在DefocusNet数据集上的平均绝对误差（MAE）等7个指标上取得了最优；在NYU Depth V2数据集上的4个指标上取得了最优，3个指标上取得了次优；同时，轻量化的设计使所提方法的推理时间分别缩短了43.92%~70.20%和47.91%~77.01%。可见，所提方法能有效地提高焦点堆栈深度估计的准确性及推理速度。

关键词: 单目深度估计, 焦点堆栈, 失焦模糊, 焦点体, 模糊歧义性

CLC Number:

TP391.4

Meng ZHOU, Zhangjin HUANG. Focal stack depth estimation method based on defocus blur[J]. Journal of Computer Applications, 2023, 43(9): 2897-2903.

周萌, 黄章进. 基于失焦模糊的焦点堆栈深度估计方法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2897-2903.

Figures/Tables 14

References 27

1	EIGEN D， PUHRSCH C， FERGUS R. Depth map prediction from a single image using a multi-scale deep network［C］// Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2. Cambridge： MIT Press， 2014： 2366-2374. 10.48550/arXiv.1406.2283
2	LAINA I， RUPPRECHT C， BELAGIANNIS V， et al. Deeper depth prediction with fully convolutional residual networks［C］// Proceedings of the 4th International Conference on 3D Vision. Piscataway： IEEE， 2016： 239-248. 10.1109/3dv.2016.32
3	YIN W， LIU Y F， SHEN C H， et al. Enforcing geometric constraints of virtual normal for depth prediction［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 5683-5692. 10.1109/iccv.2019.00578
4	LI Z Y， WANG X Y， LIU X M， et al. BinsFormer： revisiting adaptive bins for monocular depth estimation［EB/OL］. （2022-04-03）［2022-04-17］..
5	SCHARSTEIN D， SZELISKI R. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms［J］. International Journal of Computer Vision， 2002， 47（1/2/3）： 7-42. 10.1023/a:1014573219977
6	MAYER N， ILG E， HÄUSSER P， et al. A large dataset to train convolutional networks for disparity， optical flow， and scene flow estimation［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 4040-4048. 10.1109/cvpr.2016.438
7	KENDALL A， MARTIROSYAN H， DASGUPTA S， et al. End-to-end learning of geometry and context for deep stereo regression［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 66-75. 10.1109/iccv.2017.17
8	GARG R， KUMAR B G V， CARNEIRO G， et al. Unsupervised CNN for single view depth estimation： geometry to the rescue［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9912. Cham： Springer， 2016： 740-756.
9	ZHOU T H， BROWN M， SNAVELY N， et al. Unsupervised learning of depth and ego-motion from video［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 6612-6619. 10.1109/cvpr.2017.700
10	GODARD C， AODHA O MAC， FIRMAN M， et al. Digging into self-supervised monocular depth estimation［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 3827-3837. 10.1109/iccv.2019.00393
11	GODARD C， AODHA O MAC， BROSTOW G J. Unsupervised monocular depth estimation with left-right consistency［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 6602-6611. 10.1109/cvpr.2017.699
12	SUWAJANAKORN S， HERNÁNDEZ C， SEITZ S M. Depth from focus with your mobile phone［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 3497-3506. 10.1109/cvpr.2015.7298972
13	MAXIMOV M， GALIM K， LEAL-TAIXÉ L. Focus on defocus： bridging the synthetic to real domain gap for depth estimation［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 1071-1080. 10.1109/cvpr42600.2020.00115
14	WANG N H， WANG R， LIU Y L， et al. Bridging unsupervised and supervised depth from focus via all-in-focus supervision［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 12601-12611. 10.1109/iccv48922.2021.01239
15	FUJIMURA Y， IIYAMA M， FUNATOMI T， et al. Deep depth from focal stack with defocus model for camera-setting invariance［EB/OL］. （2022-02-26）［2022-03-12］..
16	YANG F T， HUANG X L， ZHOU Z H. Deep depth from focus with differential focus volume［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 12632-12641. 10.1109/cvpr52688.2022.01231
17	HAZIRBAS C， SOYER S G， STAAB M C， et al. Deep depth from focus［C］// Proceedings of the 2018 Asian Conference on Computer Vision， LNCS 11363. Cham： Springer， 2019： 525-541.
18	CERUSO S， BONAQUE-GONZÁLEZ S， OLIVA-GARCÍA R， et al. Relative multiscale deep depth from focus［J］. Signal Processing： Image Communication， 2021， 99： No.116417. 10.1016/j.image.2021.116417
19	GUO Q， FENG W， ZHOU C， et al. Learning dynamic Siamese network for visual object tracking［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 1781-1789. 10.1109/iccv.2017.196
20	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017： 6000-6010.
21	NAYAR S K， WATANABE M， NOGUCHI M. Real-time focus range sensor［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 1996， 18（12）： 1186-1198. 10.1109/34.546256
22	SRINIVASAN P P， GARG R， WADHWA N， et al. Aperture supervision for monocular depth estimation［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 6393-6401. 10.1109/cvpr.2018.00669
23	CARVALHO M， LE SAUX B， TROUVÉ-PELOUX P， et al. Deep depth from defocus： how can defocus blur improve 3D estimation using dense neural networks？［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11129. Cham： Springer， 2019： 307-323.
24	GALETTO F J， DENG G. Single image deep defocus estimation and its applications［EB/OL］. （2021-12-14）［2022-02-19］.. 10.1007/s00371-022-02609-9
25	SZEGEDY C， VANHOUCKE V， IOFFE S， et al. Rethinking the inception architecture for computer vision［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 2818-2826. 10.1109/cvpr.2016.308
26	KASHIWAGI M， MISHIMA N， KOZAKAYA T， et al. Deep depth from aberration map［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 4069-4078. 10.1109/iccv.2019.00417
27	WON C， JEON H G. Learning depth from focus in the wild［C］// Proceedings of the 2022 European Conference on Computer Vision， LNCS 13661. Cham： Springer， 2022： 1-18.

数据集	方法	MAE	MSE	RMS	logRMS	absRel	sqrRel	推理时间/ms
DefocusNet	AiFDepthNet	7.880E-2	2.414E-2	0.145 E+0	0.258	0.161	4.030E-2	23.52
	DefocusNet	6.172E-2	1.596E-2	0.115 E+0	0.223	0.139	3.096E-2	20.47
	文献［27］方法	7.289E-2	2.250E-2	0.139 E+0	0.262	0.146	3.743E-2	12.50
	本文方法	5.326E-2	1.190E-2	0.099 E+0	0.182	0.115	1.613E-2	7.01
NYU Depth V2	AiFDepthNet	1.647E+0	2.768E+0	1.618E+0	1.834	5.572	9.498E+0	38.98
	DefocusNet	7.703E-2	9.934E-3	8.621E-2	0.271	0.232	2.590E-2	25.53
	文献［27］方法	8.829E-2	1.308E-2	1.008E-1	0.329	0.253	3.260E-2	17.20
	本文方法	6.804E-2	1.006E-2	9.516E-2	0.267	0.205	2.917E-2	8.96

数据集	方法	MAE	MSE	RMS	logRMS	absRel	sqrRel	推理时间/ms
DefocusNet	AiFDepthNet	7.880E-2	2.414E-2	0.145 E+0	0.258	0.161	4.030E-2	23.52
	DefocusNet	6.172E-2	1.596E-2	0.115 E+0	0.223	0.139	3.096E-2	20.47
	文献［27］方法	7.289E-2	2.250E-2	0.139 E+0	0.262	0.146	3.743E-2	12.50
	本文方法	5.326E-2	1.190E-2	0.099 E+0	0.182	0.115	1.613E-2	7.01
NYU Depth V2	AiFDepthNet	1.647E+0	2.768E+0	1.618E+0	1.834	5.572	9.498E+0	38.98
	DefocusNet	7.703E-2	9.934E-3	8.621E-2	0.271	0.232	2.590E-2	25.53
	文献［27］方法	8.829E-2	1.308E-2	1.008E-1	0.329	0.253	3.260E-2	17.20
	本文方法	6.804E-2	1.006E-2	9.516E-2	0.267	0.205	2.917E-2	8.96

方法	MAE	RMS	absRel	sc-inv	ssitrim
AiFDepthNet	0.239	0.312	0.276	0.319	0.509
DefocusNet	0.184	0.322	0.188	0.213	0.209
文献［15］方法	0.097	0.141	0.126	0.157	0.209
本文方法	0.096	0.114	0.162	0.088	0.250

方法	MAE	RMS	absRel	sc-inv	ssitrim
AiFDepthNet	0.239	0.312	0.276	0.319	0.509
DefocusNet	0.184	0.322	0.188	0.213	0.209
文献［15］方法	0.097	0.141	0.126	0.157	0.209
本文方法	0.096	0.114	0.162	0.088	0.250

实验	特征提取		焦点体			预测		评估指标
实验	3D感知	孪生网络	Naive	Diff-sxy	Diff-RGB	Layered	DO	MAE	MSE	sqrRel
1	—	􀳫	􀳫	—	—	􀳫	—	6.252E-2	0.118	2.606E-2
2	􀳫	—	􀳫	—	—	􀳫	—	6.081E-2	0.110	2.081E-2
3	􀳫	—	􀳫	—	—	—	􀳫	1.658E-1	0.264	9.776E-2
4	􀳫	—	—	􀳫	—	􀳫	—	5.846E-2	0.129	4.059E-2
5	􀳫	—	—	—	􀳫	􀳫	—	5.326E-2	0.099	1.613E-2