基于多孔卷积神经网络的图像深度估计模型

doi:10.11772/j.issn.1001-9081.2018061305

计算机应用 ›› 2019, Vol. 39 ›› Issue (1): 267-274.DOI: 10.11772/j.issn.1001-9081.2018061305

• 虚拟现实与多媒体计算 • 上一篇下一篇

基于多孔卷积神经网络的图像深度估计模型

廖斌, 李浩文

湖北大学计算机与信息工程学院, 武汉 430062

收稿日期:2018-06-22 修回日期:2018-08-09 出版日期:2019-01-10 发布日期:2019-01-21
通讯作者: 廖斌
作者简介:廖斌(1979-),男,湖北襄阳人,教授,博士,主要研究方向:图像视频处理;李浩文(1993-),男,河南洛阳人,硕士研究生,主要研究方向:图像视频处理。
基金资助:
国家自然科学基金资助项目（61300125）。

Image depth estimation model based on atrous convolutional neural network

LIAO Bin, LI Haowen

School of Computer Science and Information Engineering, Hubei University, Wuhan Hubei 430062, China

Received:2018-06-22 Revised:2018-08-09 Online:2019-01-10 Published:2019-01-21
Supported by:
This work is partially supported by the National Natural Science Foundation of China (61300125).

摘要/Abstract

摘要： 针对在传统机器学习方法下单幅图像深度估计效果差、深度值获取不准确的问题，提出了一种基于多孔卷积神经网络（ACNN）的深度估计模型。首先，利用卷积神经网络（CNN）逐层提取原始图像的特征图；其次，利用多孔卷积结构，将原始图像中的空间信息与提取到的底层图像特征相互融合，得到初始深度图；最后，将初始深度图送入条件随机场（CRF），联合图像的像素空间位置、灰度及其梯度信息对所得深度图进行优化处理，得到最终深度图。在客观数据集上完成了模型可用性验证及误差估计，实验结果表明，该算法获得了更低的误差值和更高的准确率，均方根误差（RMSE）比基于机器学习的算法平均降低了30.86%，而准确率比基于深度学习的算法提高了14.5%，所提算法在误差数据和视觉效果方面都有较大提升，表明该模型能够在图像深度估计中获得更好的效果。

关键词: 多孔卷积, 卷积神经网络, 条件随机场, 深度估计, 深度学习

Abstract: Focusing on the issues of poor depth estimation and inaccurate depth value acquisition under traditional machine learning methods, a depth estimation model based on Atrous Convolutional Neural Network (ACNN) was proposed. Firstly, the feature map of original image was extracted layer by layer using Convolutional Neural Network (CNN). Secondly, with the atrous convolution structure, the spatial information in original image and the extracted feature map were fused to obtain initial depth map. Finally, the Conditional Random Field (CRF) with combining three constraints, pixel spatial position, grayscale and gradient information were used to optimize initial depth map and obtain final depth map. The model usability verification and error estimation were completed on objective data set. The experimental results show that the proposed algorithm obtains lower error value and higher accuracy. The Root Mean Square Error (RMS) is averagely reduced by 30.86% compared with machine learning based algorithm, and the accuracy is improved by 14.5% compared with deep learning based algorithm. The proposed algorithm has a significant improvement in error reduction and visual effect, indicating that the model can obtain better results in image depth estimation.

Key words: atrous convolution, Convolutional Neural Network (CNN), Conditional Random Field (CRF), depth estimation, deep learning

中图分类号:

TP391.413

廖斌, 李浩文. 基于多孔卷积神经网络的图像深度估计模型[J]. 计算机应用, 2019, 39(1): 267-274.

LIAO Bin, LI Haowen. Image depth estimation model based on atrous convolutional neural network[J]. Journal of Computer Applications, 2019, 39(1): 267-274.

参考文献

[1] SAXENA A, CHUNG S H, NG A Y. Learning depth from single monocular images[C]//Proceedings of the 2005 International Conference on Neural Information Processing Systems. Cambridge, MA:MIT Press, 2005:1161-1168.
[2] 胡良梅,姬长动,张旭东,等.聚焦性检测与彩色信息引导的光场图像深度提取[J].中国图象图形学报,2016,21(2):155-164.(HU L M, JI C D, ZHANG X D, et al. Color-guided depth map extraction from light field based on focusness detection[J]. Journal of Image and Graphics, 2016, 21(2):155-164.)
[3] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2016:770-778.
[4] KRISTAN M, PFLUGFELDER R, MATAS J, et al. The visual object tracking VOT2013 challenge results[C]//Proceedings of the 2013 IEEE International Conference on Computer Vision Workshops. Washington, DC:IEEE Computer Society, 2013:98-111.
[5] SANTANA E, HOTZ G. Learning a driving simulator[J/OL]. ArXiv Preprint, 2016, 2016:1608.01230[2017-08-03]. https://arxiv.org/abs/1608.01230.
[6] SZELISKI R. Computer vision[J]. Springer-Verlag GmbH, 2010, 12(8):1741-1751.
[7] CHEN C H. Handbook of Pattern Recognition and Computer Vision[M]. Singapore:World Scientific, 1993:697-698.
[8] CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation[J/OL]. ArXiv Preprint, 2017, 2017:1706.05587[2018-01-17]. https://arxiv.org/abs/1706.05587.
[9] 杨帆,李建平,李鑫,等.基于多任务深度卷积神经网络的显著性对象检测算法[J].计算机应用,2018,38(1):91-96.(YANG F, LI J P, LI X, et al. Salient object detection algorithm based on multi-task deep convolutional neural network[J]. Journal of Computer Applications, 2018, 38(1):91-96.)
[10] 孙毅堂,宋慧慧,张开华,等.基于极深卷积神经网络的人脸超分辨率重建算法[J].计算机应用,2018,38(4):1141-1145.(SUN Y T, SONG H H, ZHANG K H, et al. Face super-resolution via very deep convolutional neural network[J]. Journal of Computer Applications, 2018, 38(4):1141-1145.)
[11] BATTIATO S, CURTI S, CASCIA M L, et al. Depth map generation by image classification[C]//Proceedings of the 2004 Three-Dimensional Image Capture and Applications VI. Bellingham, WA:SPIE, 2004:95-104.
[12] CHANG Y L, FANG C Y, DING L F, et al. Depth map generation for 2D-to-3D conversion by short-term motion assisted color segmentation[C]//Proceedings of the 2007 IEEE International Conference on Multimedia and Expo. Piscataway, NJ:IEEE, 2007:1958-1961.
[13] KARSCH K, LIU C, KANG S B. Depth transfer:depth extraction from video using non-parametric sampling[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 36(11):2144.
[14] EIGEN D, PUHRSCH C, FERGUS R. Depth map prediction from a single image using a multi-scale deep network[J/OL]. ArXiv Preprint, 2014, 2014:1406.2283[2017-12-09]. https://arxiv.org/abs/1406.2283.
[15] LIU F, SHEN C, LIN G, et al. Learning depth from single monocular images using deep convolutional neural fields[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(10):2024-2039.
[16] ABHAYARATNE G C K, PESQUETPOPESCU B. Adaptive integer-to-integer wavelet transforms using update lifting[C]//Proceedings of the SPIE Wavelets:Applications in Signal and Image Processing X. Bellingham, WA:SPIE, 2003:813-824.
[17] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J/OL]. ArXiv Preprint, 2015, 2015:1409.1556[2017-04-10]. https://arxiv.org/abs/1409.1556.
[18] GLOROT X, BORDES A, BENGIO Y. Deep sparse rectifier neural networks[C]//Proceedings of the 2011 International Conference on Artificial Intelligence and Statistics. Fort Lauderdale:PMLR, 2011:315-323.
[19] RADOSAVLJEVIC V, VUCETIC S, OBRADOVIC Z. Continuous conditional random fields for regression in remote sensing[J]. Frontiers in Artificial Intelligence and Applications, 2010, 215:809-814.
[20] ADAMS A, BAEK J, ABRAHAM DAVIS M. Fast high-dimensional filtering using the permutohedral lattice[J]. Computer Graphics Forum, 2010, 29(2):753-762.
[21] SILBERMAN N, HOIEM D, KOHLI P, et al. Indoor segmentation and support inference from RGBD images[C]//ECCV 2012:Proceedings of the European Conference on Computer Vision. Berlin:Springer, 2012:746-760.
[22] KARSCH K, LIU C, KANG S B. Depth transfer:depth extraction from videos using nonparametric sampling[M]//Dense Image Correspondences for Computer Vision. Berlin:Springer, 2016:775-788.
[23] SAXENA A, SUN M, NG A Y. Make 3D:learning 3D scene structure from a single still image[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(5):824-840.
[24] LIU M, SALZMANN M, HE X. Discrete-continuous depth estimation from a single image[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2014:716-723.

基于多孔卷积神经网络的图像深度估计模型

Image depth estimation model based on atrous convolutional neural network

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	王贺兵, 张春梅. 基于非对称卷积-压缩激发-次代残差网络的人脸关键点检测[J]. 计算机应用, 2021, 41(9): 2741-2747.
[2]	郑志强, 胡鑫, 翁智, 王雨禾, 程曦. 基于改进DenseNet的牛眼图像特征提取方法[J]. 计算机应用, 2021, 41(9): 2780-2784.
[3]	陈成瑞, 孙宁, 何世彪, 廖勇. 面向C-V2X通信的基于深度学习的联合信道估计与均衡算法[J]. 计算机应用, 2021, 41(9): 2687-2693.
[4]	宋中山, 梁家锐, 郑禄, 刘振宇, 帖军. 基于双向门控尺度特征融合的遥感场景分类[J]. 计算机应用, 2021, 41(9): 2726-2735.
[5]	李康康, 张静. 基于注意力机制的多层次编码和解码的图像描述模型[J]. 计算机应用, 2021, 41(9): 2504-2509.
[6]	张永斌, 常文欣, 孙连山, 张航. 基于字典的域名生成算法生成域名的检测方法[J]. 计算机应用, 2021, 41(9): 2609-2614.
[7]	赵宏, 孔东一. 图像特征注意力与自适应注意力融合的图像内容中文描述[J]. 计算机应用, 2021, 41(9): 2496-2503.
[8]	徐江浪, 李林燕, 万新军, 胡伏原. 结合目标检测的室内场景识别方法[J]. 计算机应用, 2021, 41(9): 2720-2725.
[9]	牟长宁, 王海鹏, 周丕宇, 侯鑫行. 基于图卷积神经网络的串联质谱从头测序[J]. 计算机应用, 2021, 41(9): 2773-2779.
[10]	谢德峰, 吉建民. 融入句法感知表示进行句法增强的语义解析[J]. 计算机应用, 2021, 41(9): 2489-2495.
[11]	代雨柔, 杨庆, 张凤荔, 周帆. 基于自监督学习的社交网络用户轨迹预测模型[J]. 计算机应用, 2021, 41(9): 2545-2551.
[12]	何正海, 线岩团, 王蒙, 余正涛. 融合句法指导与字符注意力机制的案情阅读理解方法[J]. 计算机应用, 2021, 41(8): 2427-2431.
[13]	曹玉红, 徐海, 刘荪傲, 王紫霄, 李宏亮. 基于深度学习的医学影像分割研究综述[J]. 计算机应用, 2021, 41(8): 2273-2287.
[14]	秦斌斌, 彭良康, 卢向明, 钱江波. 司机分心驾驶检测研究进展[J]. 计算机应用, 2021, 41(8): 2330-2337.
[15]	黄程程, 董霄霄, 李钊. 基于二维Winograd算法的深流水线5×5卷积方法[J]. 计算机应用, 2021, 41(8): 2258-2264.