《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (9): 2897-2903.DOI: 10.11772/j.issn.1001-9081.2022091342

• 多媒体计算与计算机仿真 • 上一篇    下一篇

基于失焦模糊的焦点堆栈深度估计方法

周萌, 黄章进()   

  1. 中国科学技术大学 计算机科学与技术学院,合肥 230027
  • 收稿日期:2022-09-15 修回日期:2022-11-23 接受日期:2022-11-30 发布日期:2023-02-22 出版日期:2023-09-10
  • 通讯作者: 黄章进
  • 作者简介:周萌(1993—),男,湖北荆门人,硕士研究生,CCF会员,主要研究方向:三维视觉、深度估计;
  • 基金资助:
    国家自然科学基金资助项目(61877056)

Focal stack depth estimation method based on defocus blur

Meng ZHOU, Zhangjin HUANG()   

  1. School of Computer Science and Technology,University of Science and Technology of China,Hefei Anhui 230027,China
  • Received:2022-09-15 Revised:2022-11-23 Accepted:2022-11-30 Online:2023-02-22 Published:2023-09-10
  • Contact: Zhangjin HUANG
  • About author:ZHOU Meng, born in 1993, M. S. candidate. His research interests include 3D vision, depth estimation.
  • Supported by:
    National Natural Science Foundation of China(61877056)

摘要:

现有的单目深度估计方法通常使用图像语义信息来获取深度,忽略了另一个重要的线索——失焦模糊。同时,基于失焦模糊的深度估计方法通常把焦点堆栈或者梯度信息作为输入,没有考虑到焦点堆栈各图像层之间的模糊变化量小以及焦点平面两侧具有模糊歧义性的特点。针对现有焦点堆栈深度估计方法的不足,提出一种基于三维卷积的轻量化网络。首先,设计一个三维感知模块对焦点堆栈的模糊信息进行粗提取;然后,将提取到的信息与通道差分模块输出的焦点堆栈RGB通道差分特征进行级联,构建可以识别模糊歧义性模式的焦点体;最后,利用多尺度三维卷积来预测深度。实验结果表明,与AiFDepthNet(All in Focus Depth Network)等方法相比,所提方法在DefocusNet数据集上的平均绝对误差(MAE)等7个指标上取得了最优;在NYU Depth V2数据集上的4个指标上取得了最优,3个指标上取得了次优;同时,轻量化的设计使所提方法的推理时间分别缩短了43.92%~70.20%和47.91%~77.01%。可见,所提方法能有效地提高焦点堆栈深度估计的准确性及推理速度。

关键词: 单目深度估计, 焦点堆栈, 失焦模糊, 焦点体, 模糊歧义性

Abstract:

The existing monocular depth estimation methods often use image semantic information to obtain depth, and ignore another important cue — defocus blur. At the same time, the defocus blur based depth estimation methods usually take the focal stack or gradient information as input, and do not consider the characteristics of the small variation of blur between image layers of the focal stack and the blur ambiguity on both sides of the focal plane. Aiming at the deficiencies of the existing focal stack depth estimation methods, a lightweight network based on three-dimensional convolution was proposed. Firstly, a Three-Dimensional perception module was designed to roughly extract the blur information of the focal stack. Secondly, the extracted information was concatenated with the difference features of the focal stack RGB channels output by a channel difference module to construct a focus volume that was able to identify the blur ambiguity patterns. Finally, a multi-scale three-dimensional convolution was used to predict the depth. Experimental results show that compared with methods such as All in Focus Depth Network (AiFDepthNet), the proposed method achieves the best on seven indicators such as Mean Absolute Error (MAE) on DefocusNet dataset, and the best on four indicators as well as the suboptimal on three indicators on NYU Depth V2 dataset; at the same time, the lightweight design reduces the inference time of the proposed method by 43.92% to 70.20% and 47.91% to 77.01% on two datasets respectively. The above verifies that the proposed method can effectively improve the accuracy and inference speed of focal stack depth estimation.

Key words: monocular depth estimation, focal stack, defocus blur, focus volume, blur ambiguity

中图分类号: