动态环境下基于深度学习的语义SLAM算法

doi:10.11772/j.issn.1001-9081.2020111885

计算机应用 ›› 2021, Vol. 41 ›› Issue (10): 2945-2951.DOI: 10.11772/j.issn.1001-9081.2020111885

所属专题：多媒体计算与计算机仿真

• 多媒体计算与计算机仿真 • 上一篇下一篇

动态环境下基于深度学习的语义SLAM算法

郑思诚^1,2, 孔令华^1,2, 游通飞^1,2, 易定容³

1. 福建工程学院机械与汽车工程学院, 福州 350118;
2. 数字福建工业制造物联网实验室(福建工程学院), 福州 350118;
3. 华侨大学机电及自动化学院, 福建厦门 361021

收稿日期:2020-12-01 修回日期:2021-04-06 发布日期:2021-05-12 出版日期:2021-10-10
通讯作者: 易定容
作者简介:郑思诚(1996-),男,福建福州人,硕士研究生,主要研究方向:视觉同步定位与地图构建、深度学习;孔令华(1963-),男,加拿大人,教授,博士,主要研究方向:三维视觉、多光谱检测;游通飞(1994-),男,福建福州人,硕士研究生,主要研究方向:视觉同步定位与地图构建、深度学习;易定容(1969-),女,重庆合川人,教授,博士,主要研究方向:三维视觉、微观三维形貌。
基金资助:
国家自然科学基金面上项目（51775200）。

Semantic SLAM algorithm based on deep learning in dynamic environment

ZHENG Sicheng^1,2, KONG Linghua^1,2, YOU Tongfei^1,2, YI Dingrong³

1. School of Mechanical and Automotive Engineering, Fujian University of Technology, Fuzhou Fujian 350118, China;
2. Digital Fujian Industrial Manufacturing IoT Lab(Fujian University of Technology), Fuzhou Fujian 350118, China;
3. College of Mechanical Engineering and Automation, Huaqiao University, Xiamen Fujian 361021, China

Received:2020-12-01 Revised:2021-04-06 Online:2021-05-12 Published:2021-10-10
Supported by:
This work is partially supported by the Surface Program of National Natural Science Foundation of China (51775200).

摘要/Abstract

摘要： 针对应用场景中存在的运动物体会降低视觉同步定位与地图构建（SLAM）系统的定位精度和鲁棒性的问题，提出一种基于语义信息的动态环境下的视觉SLAM算法。首先，将传统视觉SLAM前端与YOLOv4目标检测算法相结合，在对输入图像进行ORB特征提取的同时，对该图像进行语义分割；然后，判断目标类型以获得动态目标在图像中的区域，剔除分布在动态物体上的特征点；最后，使用处理后的特征点与相邻帧进行帧间匹配来求解相机位姿。实验采用TUM数据集进行测试，测试结果表明，所提算法相较于ORB-SLAM2在高动态环境下在位姿估计精度上提升了96.78%，同时该算法的跟踪线程处理一帧的平均耗时为0.065 5 s，相较于其他应用在动态环境下的SLAM算法耗时最短。实验结果表明，所提算法能够实现在动态环境中的实时精确定位与建图。

关键词: 视觉同步定位与地图构建, 语义信息, 目标检测算法, 特征点, 动态环境

Abstract: Concerning the problem that the existence of moving objects in the application scenes will reduce the positioning accuracy and robustness of the visual Synchronous Localization And Mapping (SLAM) system, a semantic information based visual SLAM algorithm in dynamic environment was proposed. Firstly, the traditional visual SLAM front end was combined with the YOLOv4 object detection algorithm, during the extraction of ORB (Oriented FAST and Rotated BRIEF) features of the input image, the image was semantically segmented. Then, the object type was judged to obtain the area of the dynamic object in the image, and the feature points distributed on the dynamic object were eliminated. Finally, the camera pose was solved by using inter-frame matching between the processed feature points and the adjacent frames. The test results on TUM dataset show that, the accuracy of the pose estimation of this algorithm is 96.78% higher than that of ORB-SLAM2 (Orient FAST and Rotated BRIEF SLAM2) in a high dynamic environment, and the average consumption time per frame of tracking thread of the algorithm is 0.065 5 s, which is the shortest time consumption compared to those of the other SLAM algorithms used in dynamic environment. The above experimental results illustrate that the proposed algorithm can realize real-time precise positioning and mapping in dynamic environment.

Key words: visual Simultaneous Localization And Mapping (SLAM), semantic information, object detection algorithm, feature point, dynamic environment

中图分类号:

TP242.6

郑思诚, 孔令华, 游通飞, 易定容. 动态环境下基于深度学习的语义SLAM算法[J]. 计算机应用, 2021, 41(10): 2945-2951.

ZHENG Sicheng, KONG Linghua, YOU Tongfei, YI Dingrong. Semantic SLAM algorithm based on deep learning in dynamic environment[J]. Journal of Computer Applications, 2021, 41(10): 2945-2951.

参考文献

[1] 权美香, 朴松昊, 李国. 视觉SLAM综述[J]. 智能系统学报, 2016, 11(6):768-776. (QUAN M X, PIAO S H, LI G. An overview of visual SLAM[J]. CAAI Transactions on Intelligent Systems, 2016, 11(6):768-776.)
[2] MUR-ARTAL R, TARDÓS J D. ORB-SLAM2:an open-source SLAM system for monocular, stereo, and RGB-D cameras[J]. IEEE Transactions on Robotics, 2017, 33(5):1255-1262.
[3] ENGEL J, SCHÖPS T, CREMERS D. LSD-SLAM:large-scale direct monocular SLAM[C]//Proceedings of the 2014 European Conference on Computer Vision, LNCS 8690. Cham:Springer, 2014:834-849.
[4] 方岚, 于凤芹. 去除鬼影及阴影的视觉背景提取运动目标检测算法[J]. 激光与光电子学进展, 2019, 56(13):No. 131002. (FANG L, YU F Q. Moving object detection algorithm based on removed ghost and shadow visual background extractor[J]. Laser and Optoelectronics Progress, 2019, 56(13):No. 131002.)
[5] ENDRES F, HESS J, STURM J, et al. 3-D mapping with an RGBD camera[J]. IEEE Transactions on Robotics, 2014, 30(1):177-187.
[6] 赵洋, 刘国良, 田国会, 等. 基于深度学习的视觉SLAM综述[J]. 机器人, 2017, 39(6):889-896.(ZHAO Y, LIU G L, TIAN G H, et al. A survey of visual SLAM based on deep learning[J]. Robot, 2017, 39(6):889-896.)
[7] SALAS-MORENO R F, NEWCOMBE R A, STRASDAT H, et al. SLAM++:simultaneous localisation and mapping at the level of objects[C]//Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2013:1352-1359.
[8] IZADI S, KIM D, HILLIGES O, et al. KinectFusion:real-time 3D reconstruction and interaction using a moving depth camera[C]//Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology. New York:ACM, 2011:559-568.
[9] BESCOS B, FÁCIL J M, CIVERA J, et al. DynaSLAM:tracking, mapping, and inpainting in dynamic scenes[J]. IEEE Robotics and Automation Letters, 2018, 3(4):4076-4083.
[10] JOHNSON J W. Adapting Mask-RCNN for automatic nucleus segmentation[EB/OL]. (2018-05-01)[2020-11-20]. https://arxiv.org/pdf/1805.00500.pdf.
[11] YU C, LIU Z X, LIU X J, et al. DS-SLAM:a semantic visual SLAM towards dynamic environments[C]//Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway:IEEE, 2018:1168-1174.
[12] BADRINARAYANAN V, KENDALL A, CIPOLLA R. SegNet:a deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12):2481-2495.
[13] BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4:optimal speed and accuracy of object detection[EB/OL]. (2020-04-23)[2020-11-20]. https://arxiv.org/pdf/2004.10934.pdf.
[14] REDMON J, FARHADI A. YOLOv3:an incremental improvement[EB/OL]. (2018-04-08)[2020-11-20]. https://arxiv.org/pdf/1804.02767.pdf.
[15] WANG C Y, MARK LIAO H Y, WU Y H, et al. CSPNet:a new backbone that can enhance learning capability of CNN[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway:IEEE, 2020:1571-1580.
[16] HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9):1904-1916.
[17] LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2018:8759-8768.
[18] WANG C Y, BOCHKOVSKIY A, LIAO H Y M. ScaledYOLOv4:scaling cross stage partial network[EB/OL]. (2020-11-16)[2020-11-20]. https://arxiv.org/pdf/2011.08036v1.pdf.
[19] STURM J, ENGELHARD N, ENDRES F, et al. A benchmark for the evaluation of RGB-D SLAM systems[C]//Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway:IEEE, 2012:573-580.
[20] WHELAN T, SALAS-MORENO R F, GLOCKER B, et al. ElasticFusion:real-time dense SLAM and light source estimation[J]. The International Journal of Robotics Research, 2016, 35(14):1697-1716.
[21] RUNZ M, BUFFIER M, AGAPITO L. MaskFusion:real-time recognition, tracking and reconstruction of multiple moving objects[C]//Proceedings of the 2018 IEEE International Symposium on Mixed and Augmented Reality. Piscataway:IEEE, 2018:10-20.

动态环境下基于深度学习的语义SLAM算法

Semantic SLAM algorithm based on deep learning in dynamic environment

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	邓辅秦, 官桧锋, 谭朝恩, 付兰慧, 王宏民, 林天麟, 张建民. 基于请求与应答通信机制和局部注意力机制的多机器人强化学习路径规划方法[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 432-438.
[2]	朱东莹, 钟勇, 杨观赐, 李杨. 动态环境下视觉定位与建图的运动分割研究进展[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2537-2545.
[3]	马胜位, 黄瑞章, 任丽娜, 林川. 基于多层语义融合的结构化深度文本聚类模型[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2364-2369.
[4]	许喆, 王志宏, 单存宇, 孙亚茹, 杨莹. 基于重构误差的无监督人脸伪造视频检测[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1571-1577.
[5]	肖田邹子, 周小博, 罗欣, 唐其鹏. 动态环境下结合实例分割与聚类的鲁棒RGB-D SLAM系统[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1220-1225.
[6]	王晓雨, 王展青, 熊威. 深度非对称离散跨模态哈希方法[J]. 《计算机应用》唯一官方网站, 2022, 42(8): 2461-2470.
[7]	代少升, 熊昆, 吴云铎, 肖佳伟. 多视角约束级联回归的视频人脸特征点跟踪[J]. 《计算机应用》唯一官方网站, 2022, 42(8): 2415-2422.
[8]	吕潇, 宋慧慧, 樊佳庆. 深浅层表示融合的半监督视频目标分割[J]. 《计算机应用》唯一官方网站, 2022, 42(12): 3884-3890.
[9]	吕学强, 彭郴, 张乐, 董志安, 游新冬. 融合BERT与标签语义注意力的文本多标签分类方法[J]. 《计算机应用》唯一官方网站, 2022, 42(1): 57-63.
[10]	吴丽丹, 薛雨阳, 童同, 杜民, 高钦泉. 基于前景语义信息的图像着色算法[J]. 计算机应用, 2021, 41(7): 2048-2053.
[11]	章惠, 张娜娜, 黄俊. 优化LeNet-5网络的多角度头部姿态估计方法[J]. 计算机应用, 2021, 41(6): 1667-1672.
[12]	付豪, 徐和根, 张志明, 齐少华. 动态场景下基于语义和光流约束的视觉同步定位与地图构建[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3337-3344.
[13]	周超然, 赵建平, 马太, 周欣. 基于注意力机制和集成学习的网页黑名单判别方法[J]. 计算机应用, 2021, 41(1): 133-138.
[14]	邓茜文, 冯子亮, 邱晨鹏. 基于近红外与可见光双目视觉的活体人脸检测方法[J]. 计算机应用, 2020, 40(7): 2096-2103.
[15]	石志良, 蔡旺月, 汪国强, 熊林杰. 基于自适应邻域的固有形状特征算法[J]. 计算机应用, 2020, 40(4): 1151-1156.