Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (7): 2055-2064.DOI: 10.11772/j.issn.1001-9081.2023060749

• Artificial intelligence • Previous Articles     Next Articles

Mobile robot 3D space path planning method based on deep reinforcement learning

Tian MA1, Runtao XI1,2,3, Jiahao LYU1,4(), Yijie ZENG1, Jiayi YANG1, Jiehui ZHANG1   

  1. 1.College of Computer Science and Technology,Xi’an University of Science and Technology,Xi’an Shaanxi 710016,China
    2.Changzhou Research Institute,China Coal Technology Engineering Group,Changzhou Jiangsu 213015,China
    3.Tiandi (Changzhou) Automation Company Limited,Changzhou Jiangsu 213015,China
    4.School of Computer Science and Engineering,Xi’an University of Technology,Xi’an Shaanxi 710048,China
  • Received:2023-06-15 Revised:2023-08-19 Accepted:2023-08-24 Online:2023-09-11 Published:2024-07-10
  • Contact: Jiahao LYU
  • About author:MA Tian, born in 1982, Ph. D., associate professor. His research interests include graphics and image processing, data visualization.
    XI Runtao, born in 1995, M. S., assistant engineer. His research interests include reinforcement learning, computer vision.
    ZENG Yijie, born in 2000, M. S. candidate. His research interest includes path planning.
    YANG Jiayi, born in 1989, Ph. D., associate professor. His research interests include intelligent sensors and monitoring systems.
    ZHANG Jiehui, born in 1982, Ph. D., lecturer. Her research interests include computer vision.
    First author contact:LYU Jiahao, born in 1997, M. S. candidate. His research interests include reinforcement learning, path planning.
  • Supported by:
    National Key Research and Development Program of China(2021YFB4000905);National Natural Science Foundation of China(62101432);Shaanxi Natural Science Fundamental Research Program Project(2022JM-508)

基于深度强化学习的移动机器人三维路径规划方法

马天1, 席润韬1,2,3, 吕佳豪1,4(), 曾奕杰1, 杨嘉怡1, 张杰慧1   

  1. 1.西安科技大学 计算机科学与技术学院, 西安 710016
    2.中煤科工集团 常州研究院有限公司, 江苏 常州 213015
    3.天地(常州)自动化股份有限公司, 江苏 常州 213015
    4.西安理工大学 计算机科学与工程学院, 西安 710048
  • 通讯作者: 吕佳豪
  • 作者简介:马天(1982—),男,河南商丘人,副教授,博士,CCF高级会员,主要研究方向:图形图像处理、数据可视化;
    席润韬(1995—),男,河北张家口人,助理工程师,硕士,主要研究方向:强化学习、计算机视觉;
    曾奕杰(2000—),男,江苏无锡人,硕士研究生,CCF学生会员,主要研究方向:路径规划;
    杨嘉怡(1989—),男,陕西西安人,副教授,博士,主要研究方向:智能传感器与监测系统;
    张杰慧(1982—),女,湖南邵阳人,讲师,博士,主要研究方向:计算机视觉。
    第一联系人:吕佳豪(1997—),男,陕西泾阳人,硕士研究生,主要研究方向:强化学习、路径规划;
  • 基金资助:
    国家重点研发计划项目(2021YFB4000905);国家自然科学基金资助项目(62101432);陕西省自然科学基础研究计划项目(2022JM-508)

Abstract:

Aiming at the problems of high complexity and uncertainty in 3D unknown environment, a mobile robot 3D path planning method based on deep reinforcement learning was proposed, under a limited observation space optimization strategy. First, the depth map information was used as the agent’s input in the limited observation space, which could simulate complex 3D space environments with limited and unknown movement conditions. Second, a two-stage action selection policy in discrete action space was designed, including directional actions and movement actions, which could reduce the searching steps and time. Finally, based on the Proximal Policy Optimization (PPO) algorithm, the Gated Recurrent Unit (GRU) was added to combine the historical state information, to enhance the policy stability in unknown environments, so that the accuracy and smoothness of the planned path could be improved. The experimental results show that, compared with Advantage Actor-Critic (A2C), the average search time is reduced by 49.07% and the average planned path length is reduced by 1.04%. Meanwhile, the proposed method can complete the multi-objective path planning tasks under linear sequential logic constraints.

Key words: deep reinforcement learning, mobile robot, three-dimensional path planning, Proximal Policy Optimization (PPO), depth map

摘要:

针对三维未知环境中存在的高复杂度和不确定性的问题,提出一种在有限观测空间优化策略下基于深度强化学习的移动机器人三维路径规划方法。首先,在有限观测空间下采用深度图信息作为智能体的输入,模拟移动受限且未知的复杂三维空间环境;其次,设计了两阶段离散动作空间下的动作选择策略,包括方向动作和位移动作,以减少搜索步数和时间;最后,在近端策略优化(PPO)算法基础上,添加门控循环单元(GRU)结合历史状态信息,以提升未知环境中搜索策略的稳定性,进而提高规划路径准确度和平滑度。实验结果表明,相较于A2C(Advantage Actor-Critic),所提方法的平均搜索时间缩短了49.07%,平均规划路径长度缩短了1.04%,同时能够完成线性时序逻辑约束下的多目标路径规划任务。

关键词: 深度强化学习, 移动机器人, 三维路径规划, 近端策略优化, 深度图

CLC Number: