计算机应用 ›› 2017, Vol. 37 ›› Issue (12): 3458-3466.DOI: 10.11772/j.issn.1001-9081.2017.12.3458

• 人工智能 • 上一篇    下一篇

面向RGB-D场景解析的三维空间结构化编码深度网络

王泽宇1, 吴艳霞1, 张国印1, 布树辉2   

  1. 1. 哈尔滨工程大学 计算机科学与技术学院, 哈尔滨 150001;
    2. 西北工业大学 航空学院, 西安 710072
  • 收稿日期:2017-05-15 修回日期:2017-07-24 出版日期:2017-12-10 发布日期:2017-12-18
  • 通讯作者: 吴艳霞
  • 作者简介:王泽宇(1989-),男,河南郑州人,博士研究生,主要研究方向:机器学习、深度学习、计算机视觉;吴艳霞(1979-),女,黑龙江哈尔滨人,副教授,博士,CCF会员,主要研究方向:机器学习、计算机视觉;张国印(1962-),男,黑龙江哈尔滨人,教授,博士,CCF会员,主要研究方向:机器学习、计算机视觉;布树辉(1978-),男,河南洛阳人,教授,博士,CCF会员,主要研究方向:机器学习、深度学习、计算机视觉。
  • 基金资助:
    国家重点研发计划项目(2016YFB1000400);国家自然科学基金资助项目(60903098);中央高校自由探索基金资助项目(HEUCF100606)。

Three-dimensional spatial structured encoding deep network for RGB-D scene parsing

WANG Zeyu1, WU Yanxia1, ZHANG Guoyin1, BU Shuhui2   

  1. 1. College of Computer Science and Technology, Harbin Engineering University, Harbin Heilongjiang 150001, China;
    2. School of Aeronautics, Northwestern Polytechnical University, Xi'an Shaanxi 710072, China
  • Received:2017-05-15 Revised:2017-07-24 Online:2017-12-10 Published:2017-12-18
  • Supported by:
    This work is partially supported by the National Key Research and Development Program (2016YFB1000400), the National Natural Science Foundation of China (60903098), the Central University Free Exploration Fund (HEUCF100606).

摘要: 有效的RGB-D图像特征提取和准确的3D空间结构化学习是提升RGB-D场景解析结果的关键。目前,全卷积神经网络(FCNN)具有强大的特征提取能力,但是,该网络无法充分地学习3D空间结构化信息。为此,提出了一种新颖的三维空间结构化编码深度网络,内嵌的结构化学习层有机地结合了图模型网络和空间结构化编码算法。该算法能够比较准确地学习和描述物体所处3D空间的物体分布。通过该深度网络,不仅能够提取包含多层形状和深度信息的分层视觉特征(HVF)和分层深度特征(HDF),而且可以生成包含3D结构化信息的空间关系特征,进而得到融合上述3类特征的混合特征,从而能够更准确地表达RGB-D图像的语义信息。实验结果表明,在NYUDv2和SUNRGBD标准RGB-D数据集上,该深度网络较现有先进的场景解析方法能够显著提升RGB-D场景解析的结果。

关键词: 全卷积神经网络, 图模型, 空间结构化编码算法, 分层视觉特征, 分层深度特征, 空间关系特征, 混合特征

Abstract: Efficient feature extraction from RGB-D images and accurate 3D spatial structure learning are two key points for improving the performance of RGB-D scene parsing. Recently, Fully Convolutional Neural Network (FCNN) has powerful ability of feature extraction, however, FCNN can not learn 3D spatial structure information sufficiently. In order to solve the problem, a new neural network architecture called Three-dimensional Spatial Structured Encoding Deep Network (3D-SSEDN) was proposed. The graphical model network and spatial structured encoding algorithm were organically combined by the embedded structural learning layer, the 3D spatial distribution of objects could be precisely learned and described. Through the proposed 3D-SSEDN, not only the Hierarchical Visual Feature (HVF) and Hierarchical Depth Feature (HDF) containing hierarchical shape and depth information could be extracted, but also the spatial structure feature containing 3D structural information could be generated. Furthermore, the hybrid feature could be obtained by fusing the above three kinds of features, thus the semantic information of RGB-D images could be accurately expressed. The experimental results on the standard RGB-D datasets of NYUDv2 and SUNRGBD show that, compared with the most previous state-of-the-art scene parsing methods, the proposed 3D-SSEDN can significantly improve the performance of RGB-D scene parsing.

Key words: Fully Convolutional Neural Network (FCNN), graphical model, spatial structured encoding algorithm, Hierarchical Visual Feature (HVF), Hierarchical Depth Feature (HDF), spatial structure feature, hybrid feature

中图分类号: