Journal of Computer Applications ›› 2019, Vol. 39 ›› Issue (10): 2847-2851.DOI: 10.11772/j.issn.1001-9081.2019040711

• Artificial intelligence • Previous Articles     Next Articles

Simultaneous localization and semantic mapping of indoor dynamic scene based on semantic segmentation

XI Zhihong, HAN Shuangquan, WANG Hongxu   

  1. School of Information and Communication Engineering, Harbin Engineering University, Harbin Heilongjiang 150001, China
  • Received:2019-04-26 Revised:2019-06-08 Online:2019-10-10 Published:2019-10-14


席志红, 韩双全, 王洪旭   

  1. 哈尔滨工程大学 信息与通信工程学院, 哈尔滨 150001
  • 通讯作者: 韩双全
  • 作者简介:席志红(1965-),女,黑龙江哈尔滨人,教授,博士,主要研究方向:图像处理、室内定位;韩双全(1993-),男,山东潍坊人,硕士研究生,主要研究方向:视觉SLAM、图像理解;王洪旭(1994-),男,吉林松原人,硕士研究生,主要研究方向:视觉SLAM、图像分析。

Abstract: To address the problem that dynamic objects affect pose estimation in indoor Simultaneous Localization And Mapping (SLAM) systems, a semantic segmentation based SLAM system in dynamic scenes was proposed. Firstly, an image was semantically segmented by the Pyramid Scene Parsing Network (PSPNet) after being captured by the camera. Then image feature points were extracted, feature points distributed in the dynamic object were removed, and camera pose was estimated by using static feature points. Finally, the semantic point cloud map and semantic octree map were constructed. Results of multiple comparison tests on five dynamic sequences of public datasets show that compared with the SLAM system using SegNet network, the proposed system has the standard deviation of absolute trajectory error improved by 6.9%-89.8%, and has the standard deviation of translation and rotation drift improved by 73.61% and 72.90% respectively in the best case in high dynamic scenes. The results show that the improved method can significantly reduce the error of pose estimation in dynamic scenes, and can correctly estimate the camera pose in dynamic scenes.

Key words: semantic segmentation, dynamic scene, indoor scene, pose estimation, Visual Simultaneous Localization And Mapping (VSLAM), semantic Simultaneous Localization And Mapping (SLAM)

摘要: 针对动态物体在室内同步定位与地图构建(SLAM)系统中影响位姿估计的问题,提出一种动态场景下基于语义分割的SLAM系统。在相机捕获图像后,首先用PSPNet(Pyramid Scene Parsing Network)对图像进行语义分割;之后提取图像特征点,剔除分布在动态物体内的特征点,并用静态的特征点进行相机位姿估计;最后完成语义点云图和语义八叉树地图的构建。在公开数据集上的五个动态序列进行多次对比测试的结果表明,相对于使用SegNet网络的SLAM系统,所提系统的绝对轨迹误差的标准偏差有6.9%~89.8%的下降,平移和旋转漂移的标准偏差在高动态场景中的最佳效果也能分别提升73.61%和72.90%。结果表明,改进的系统能够显著减小动态场景下位姿估计的误差,准确地在动态场景中进行相机位姿估计。

关键词: 语义分割, 动态场景, 室内场景, 位姿估计, 视觉同步定位与地图构建, 语义同步定位与地图构建

CLC Number: