Journal of Computer Applications ›› 2021, Vol. 41 ›› Issue (10): 2945-2951.DOI: 10.11772/j.issn.1001-9081.2020111885

Special Issue: 多媒体计算与计算机仿真

• Multimedia computing and computer simulation • Previous Articles     Next Articles

Semantic SLAM algorithm based on deep learning in dynamic environment

ZHENG Sicheng1,2, KONG Linghua1,2, YOU Tongfei1,2, YI Dingrong3   

  1. 1. School of Mechanical and Automotive Engineering, Fujian University of Technology, Fuzhou Fujian 350118, China;
    2. Digital Fujian Industrial Manufacturing IoT Lab(Fujian University of Technology), Fuzhou Fujian 350118, China;
    3. College of Mechanical Engineering and Automation, Huaqiao University, Xiamen Fujian 361021, China
  • Received:2020-12-01 Revised:2021-04-06 Online:2021-10-10 Published:2021-05-12
  • Supported by:
    This work is partially supported by the Surface Program of National Natural Science Foundation of China (51775200).


郑思诚1,2, 孔令华1,2, 游通飞1,2, 易定容3   

  1. 1. 福建工程学院 机械与汽车工程学院, 福州 350118;
    2. 数字福建工业制造物联网实验室(福建工程学院), 福州 350118;
    3. 华侨大学 机电及自动化学院, 福建 厦门 361021
  • 通讯作者: 易定容
  • 作者简介:郑思诚(1996-),男,福建福州人,硕士研究生,主要研究方向:视觉同步定位与地图构建、深度学习;孔令华(1963-),男,加拿大人,教授,博士,主要研究方向:三维视觉、多光谱检测;游通飞(1994-),男,福建福州人,硕士研究生,主要研究方向:视觉同步定位与地图构建、深度学习;易定容(1969-),女,重庆合川人,教授,博士,主要研究方向:三维视觉、微观三维形貌。
  • 基金资助:

Abstract: Concerning the problem that the existence of moving objects in the application scenes will reduce the positioning accuracy and robustness of the visual Synchronous Localization And Mapping (SLAM) system, a semantic information based visual SLAM algorithm in dynamic environment was proposed. Firstly, the traditional visual SLAM front end was combined with the YOLOv4 object detection algorithm, during the extraction of ORB (Oriented FAST and Rotated BRIEF) features of the input image, the image was semantically segmented. Then, the object type was judged to obtain the area of the dynamic object in the image, and the feature points distributed on the dynamic object were eliminated. Finally, the camera pose was solved by using inter-frame matching between the processed feature points and the adjacent frames. The test results on TUM dataset show that, the accuracy of the pose estimation of this algorithm is 96.78% higher than that of ORB-SLAM2 (Orient FAST and Rotated BRIEF SLAM2) in a high dynamic environment, and the average consumption time per frame of tracking thread of the algorithm is 0.065 5 s, which is the shortest time consumption compared to those of the other SLAM algorithms used in dynamic environment. The above experimental results illustrate that the proposed algorithm can realize real-time precise positioning and mapping in dynamic environment.

Key words: visual Simultaneous Localization And Mapping (SLAM), semantic information, object detection algorithm, feature point, dynamic environment

摘要: 针对应用场景中存在的运动物体会降低视觉同步定位与地图构建(SLAM)系统的定位精度和鲁棒性的问题,提出一种基于语义信息的动态环境下的视觉SLAM算法。首先,将传统视觉SLAM前端与YOLOv4目标检测算法相结合,在对输入图像进行ORB特征提取的同时,对该图像进行语义分割;然后,判断目标类型以获得动态目标在图像中的区域,剔除分布在动态物体上的特征点;最后,使用处理后的特征点与相邻帧进行帧间匹配来求解相机位姿。实验采用TUM数据集进行测试,测试结果表明,所提算法相较于ORB-SLAM2在高动态环境下在位姿估计精度上提升了96.78%,同时该算法的跟踪线程处理一帧的平均耗时为0.065 5 s,相较于其他应用在动态环境下的SLAM算法耗时最短。实验结果表明,所提算法能够实现在动态环境中的实时精确定位与建图。

关键词: 视觉同步定位与地图构建, 语义信息, 目标检测算法, 特征点, 动态环境

CLC Number: