面向RGB-D场景解析的三维空间结构化编码深度网络

doi:10.11772/j.issn.1001-9081.2017.12.3458

计算机应用 ›› 2017, Vol. 37 ›› Issue (12): 3458-3466.DOI: 10.11772/j.issn.1001-9081.2017.12.3458

面向RGB-D场景解析的三维空间结构化编码深度网络

王泽宇¹, 吴艳霞¹, 张国印¹, 布树辉²

1. 哈尔滨工程大学计算机科学与技术学院, 哈尔滨 150001;
2. 西北工业大学航空学院, 西安 710072

收稿日期:2017-05-15 修回日期:2017-07-24 出版日期:2017-12-10 发布日期:2017-12-18
通讯作者: 吴艳霞
作者简介:王泽宇(1989-),男,河南郑州人,博士研究生,主要研究方向:机器学习、深度学习、计算机视觉;吴艳霞(1979-),女,黑龙江哈尔滨人,副教授,博士,CCF会员,主要研究方向:机器学习、计算机视觉;张国印(1962-),男,黑龙江哈尔滨人,教授,博士,CCF会员,主要研究方向:机器学习、计算机视觉;布树辉(1978-),男,河南洛阳人,教授,博士,CCF会员,主要研究方向:机器学习、深度学习、计算机视觉。
基金资助:
国家重点研发计划项目（2016YFB1000400）；国家自然科学基金资助项目（60903098）；中央高校自由探索基金资助项目（HEUCF100606）。

Three-dimensional spatial structured encoding deep network for RGB-D scene parsing

WANG Zeyu¹, WU Yanxia¹, ZHANG Guoyin¹, BU Shuhui²

1. College of Computer Science and Technology, Harbin Engineering University, Harbin Heilongjiang 150001, China;
2. School of Aeronautics, Northwestern Polytechnical University, Xi'an Shaanxi 710072, China

Received:2017-05-15 Revised:2017-07-24 Online:2017-12-10 Published:2017-12-18
Supported by:
This work is partially supported by the National Key Research and Development Program (2016YFB1000400), the National Natural Science Foundation of China (60903098), the Central University Free Exploration Fund (HEUCF100606).

摘要/Abstract

摘要： 有效的RGB-D图像特征提取和准确的3D空间结构化学习是提升RGB-D场景解析结果的关键。目前，全卷积神经网络（FCNN）具有强大的特征提取能力，但是，该网络无法充分地学习3D空间结构化信息。为此，提出了一种新颖的三维空间结构化编码深度网络，内嵌的结构化学习层有机地结合了图模型网络和空间结构化编码算法。该算法能够比较准确地学习和描述物体所处3D空间的物体分布。通过该深度网络，不仅能够提取包含多层形状和深度信息的分层视觉特征（HVF）和分层深度特征（HDF），而且可以生成包含3D结构化信息的空间关系特征，进而得到融合上述3类特征的混合特征，从而能够更准确地表达RGB-D图像的语义信息。实验结果表明，在NYUDv2和SUNRGBD标准RGB-D数据集上，该深度网络较现有先进的场景解析方法能够显著提升RGB-D场景解析的结果。

关键词: 全卷积神经网络, 图模型, 空间结构化编码算法, 分层视觉特征, 分层深度特征, 空间关系特征, 混合特征

Abstract: Efficient feature extraction from RGB-D images and accurate 3D spatial structure learning are two key points for improving the performance of RGB-D scene parsing. Recently, Fully Convolutional Neural Network (FCNN) has powerful ability of feature extraction, however, FCNN can not learn 3D spatial structure information sufficiently. In order to solve the problem, a new neural network architecture called Three-dimensional Spatial Structured Encoding Deep Network (3D-SSEDN) was proposed. The graphical model network and spatial structured encoding algorithm were organically combined by the embedded structural learning layer, the 3D spatial distribution of objects could be precisely learned and described. Through the proposed 3D-SSEDN, not only the Hierarchical Visual Feature (HVF) and Hierarchical Depth Feature (HDF) containing hierarchical shape and depth information could be extracted, but also the spatial structure feature containing 3D structural information could be generated. Furthermore, the hybrid feature could be obtained by fusing the above three kinds of features, thus the semantic information of RGB-D images could be accurately expressed. The experimental results on the standard RGB-D datasets of NYUDv2 and SUNRGBD show that, compared with the most previous state-of-the-art scene parsing methods, the proposed 3D-SSEDN can significantly improve the performance of RGB-D scene parsing.

Key words: Fully Convolutional Neural Network (FCNN), graphical model, spatial structured encoding algorithm, Hierarchical Visual Feature (HVF), Hierarchical Depth Feature (HDF), spatial structure feature, hybrid feature

中图分类号:

王泽宇, 吴艳霞, 张国印, 布树辉. 面向RGB-D场景解析的三维空间结构化编码深度网络[J]. 计算机应用, 2017, 37(12): 3458-3466.

WANG Zeyu, WU Yanxia, ZHANG Guoyin, BU Shuhui. Three-dimensional spatial structured encoding deep network for RGB-D scene parsing[J]. Journal of Computer Applications, 2017, 37(12): 3458-3466.

参考文献

[1] 徐超,闫胜业.改进的卷积神经网络行人检测方法[J].计算机应用,2017,37(6):1708-1715.(XU C, YAN S Y. Improved pedestrian detection method based on convolutional neural network[J]. Journal of Computer Applications, 2017, 37(6):1708-1715.)
[2] HINTERSTOISSER S, LEPETIT V, ILIC S, et al. Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes[C]//ACCV 2012:Proceedings of the 11th Asian Conference on Computer Vision. Berlin:Springer, 2012:548-562.
[3] SCHUSTER S, KRISHNA R, CHANG A, et al. Generating semantically precise scene graphs from textual descriptions for improved image retrieval[C]//Proceedings of the 2015 Fourth Workshop on Vision and Language. Stroudsburg, PA:ACL, 2015:70-80.
[4] SHOTTON J, WINN J, ROTHER C, et al. Textonboost for image understanding:multi-class object recognition and segmentation by jointly modeling texture, layout, and context[J]. International Journal of Computer Vision, 2009, 81(1):2-23.
[5] FARABET C, COUPRIE C, NAJMAN L, et al. Learning hierarchical features for scene labeling[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(8):1915-1929.
[6] LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ:IEEE, 2015:3431-3440.
[7] KHAN S H, BENNAMOUN M, SOHEL F, et al. Integrating geometrical context for semantic labeling of indoor scenes using RGBD images[J]. International Journal of Computer Vision, 2016, 117(1):1-20.
[8] GUPTA S, ARBELÁEZ P, GIRSHICK R, et al. Indoor scene understanding with RGB-D images:bottom-up segmentation, object detection and semantic segmentation[J]. International Journal of Computer Vision, 2015, 112(2):133-149.
[9] KENDALL A, BADRINARAYANAN V, CIPOLLA R. Bayesian SegNet:model uncertainty in deep convolutional encoder-decoder architectures for scene understanding[EB/OL].[2017-04-10]. http://pdfs.semanticscholar.org/9694/c4d214a59979ee182136e9dfb2975dfebaa2.pdf.
[10] REN X F, BO L F, FOX D. RGB-(D) scene labeling:features and algorithms[C]//Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2012:2759-2766.
[11] WANG A, LU J, CAI J, et al. Unsupervised joint feature learning and encoding for RGB-D scene labeling[J]. IEEE Transactions on Image Processing, 2015, 24(11):4459-4473.
[12] LI Z, GAN Y K, LIANG X D, et al. LSTM-CF:unifying context modeling and fusion with LSTMs for RGB-D scene labeling[C]//Proceedings of the 2016 European Conference on Computer Vision, LNCS 9906. Berlin:Springer, 2016:541-557.
[13] CADENA C, KOŠECKÁ J. Semantic segmentation with heterogeneous sensor coverages[C]//Proceedings of the 2014 IEEE Conference on Robotics and Automation, Washington, DC:IEEE Computer Society, 2014:2639-2645.
[14] ZHENG S, JAYASUMANA S, ROMERA-PAREDES B, et al. Conditional random fields as recurrent neural networks[EB/OL].[2017-04-10]. http://www.robots.ox.ac.uk/~szheng/papers/CRFasRNN.pdf.
[15] LIN G S, SHEN C H, VAN DEN HENGEL A, et al. Efficient piecewise training of deep structured models for semantic segmentation[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2016:3194-3203.
[16] 李艳丽,周忠,吴威.一种双层条件随机场的场景解析方法[J].计算机学报,2013,36(9):1898-1907.(LI Y L, ZHOU Z, WU W. Scene parsing based on a two-level conditional random field[J]. Chinese Journal of Computers, 2013, 36(9):1898-1907.)
[17] LIANG X D, SHEN X H, FENG J S, et al. Semantic object parsing with graph LSTM[C]//Proceedings of the 2016 European Conference on Computer Vision, LNCS 9905. Berlin:Springer, 2016:125-143.
[18] SILBERMAN N, HOIEM D, KOHLI P, et al. Indoor segmentation and support inference from RGBD images[C]//Proceedings of the 201212th European Conference on Computer Vision, LNCS 7576. Berlin:Springer, 2012:746-760.
[19] SONG S, LICHTENBERG S P, XIAO J X. SUN RGB-D:a RGB-D scene understanding benchmark suite[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ:IEEE, 2015:567-576.
[20] GUPTA S, GIRSHICK R, ARBELÁEZ P, et al. Learning rich features from RGB-D images for object detection and segmentation[C]//Proceedings of the 2014 European Conference on Computer Vision, LNCS 8695. Berlin:Springer, 2014:345-360.
[21] ACHANTA R, SHAJI A, SMITH K, et al. SLIC superpixels compared to state-of-the-art superpixel methods[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(11):2274-2282.
[22] 王春波,董红斌,印桂生,等.基于Hadoop的超像素分割算法[J].计算机应用,2016,36(11):2985-2992.(WANG C B, DONG H B, YIN G S, et al. Super pixel segmentation algorithm based on Hadoop[J]. Journal of Computer Applications, 2016, 36(11):2985-2992.)
[23] SMITH T, GUILD J. The CIE colorimetric standards and their use[J]. Transactions of the Optical Society, 1931, 33(3):73-134.
[24] HINTON G E, SALAKHUTDINOV R R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786):504-507.
[25] FREUND Y, HAUSSLER D. Unsupervised learning of distributions on binary vectors using two layer networks, Technical Report UCSC-CRL-94-25[R]. Santa Cruz, CA:University of California at Santa Cruz, 1994.
[26] HINTON G E, OSINDERO S, TEH Y W. A fast learning algorithm for deep belief nets[J]. Neural Computation, 2006, 18(7):1527-1554.
[27] HINTON G E. Training products of experts by minimizing contrastive divergence[J]. Neural Computation, 2002, 14(8):1771-1800.
[28] SCHMIDT M. UGM:a matlab toolbox for probabilistic undirected graphical models[EB/OL].[2017-04-10]. http://www.cs.ubc.ca/~schmidtm/Software/UGM.html.
[29] PERCEPTRON M. DeepLearning 0.1 documentation[EB/OL].[2017-04-10]. http://deeplearning.net/tutorial/.
[30] JANOCH A, KARAYEV S, JIA Y Q, et al. A category-level 3D object dataset:putting the kinect to work[M]//Consumer Depth Cameras for Computer Vision. London:Springer, 2013:141-165.
[31] XIAO J X, OWENS A, TORRALBA A. SUN3D:a database of big spaces reconstructed using sfm and object labels[C]//Proceedings of the 2013 IEEE International Conference on Computer Vision. Washington, DC:IEEE Computer Society, 2013:1625-1632.

面向RGB-D场景解析的三维空间结构化编码深度网络

Three-dimensional spatial structured encoding deep network for RGB-D scene parsing

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	周震, 袁正道. 面向物联网环境的高效通信接收机设计[J]. 计算机应用, 2020, 40(1): 202-206.
[2]	宋小娜, 芮挺, 王新晴. 结合语义边界信息的道路环境语义分割方法[J]. 计算机应用, 2019, 39(9): 2505-2510.
[3]	秦品乐, 李鹏波, 曾建潮, 朱辉, 徐少伟. 基于级联全卷积神经网络的颈部淋巴结自动识别算法[J]. 计算机应用, 2019, 39(10): 2915-2922.
[4]	张永宏, 夏广浩, 阚希, 何静, 葛涛涛, 王剑庚. 基于全卷积神经网络的多源高分辨率遥感道路提取[J]. 计算机应用, 2018, 38(7): 2070-2075.
[5]	杨朔, 陈丽芳, 石瑀, 毛一鸣. 基于深度生成式对抗网络的蓝藻语义分割[J]. 计算机应用, 2018, 38(6): 1554-1561.
[6]	范家兵, 王鹏, 周渭博, 燕京京. 在推荐系统中利用时间因素的方法[J]. 计算机应用, 2015, 35(5): 1324-1327.
[7]	杨陟卓. 基于上下文语境的词义消歧方法[J]. 计算机应用, 2015, 35(4): 1006-1008.
[8]	盛洪波汪西莉. 基于局部聚类的自适应线性近邻传递分类算法[J]. 计算机应用, 2014, 34(1): 255-259.
[9]	鲁强钟伟王智广. 基于图的分布式并行基因编程模型[J]. 计算机应用, 2013, 33(05): 1260-1266.
[10]	任梅詹永照潘道远孙佳瑶. 基于概率超图的视频事件语义检测[J]. 计算机应用, 2012, 32(11): 3014-3017.
[11]	常晓龙张晖. 融合语素特征的中文褒贬词典构建[J]. 计算机应用, 2012, 32(07): 2033-2037.
[12]	郭文强高晓光高晓光. 复杂系统的图模型多智能体协同故障诊断[J]. 计算机应用, 2010, 30(11): 2906-2909.
[13]	彭昂王如龙陈泉泉张锦. 基于复杂属性相似度的聚类算法及其应用研究[J]. 计算机应用, 2010, 30(07): 1930-1932.
[14]	葛晛晛，张俊峰，郝重阳，王宇，席迎来. 基于混合特征的H.264/AVC快速帧内预测模式决策[J]. 计算机应用, 2005, 25(08): 1811-1814.
[15]	武妍，宋金晶. 基于PCA余像空间的ICA混合特征人脸识别方法[J]. 计算机应用, 2005, 25(07): 1608-1610.