《计算机应用》唯一官方网站

• •    下一篇

基于深度强化学习的移动机器人三维路径规划方法

马天1,席润韬2,吕佳豪3,曾奕杰3,杨嘉怡3,张杰慧3   

  1. 1. 西安科技大学
    2. 中煤科工集团常州研究院有限公司;天地(常州)自动化股份有限公司
    3. 西安科技大学计算机科学与技术学院
  • 收稿日期:2023-06-12 修回日期:2023-08-21 发布日期:2023-09-11 出版日期:2023-09-11
  • 通讯作者: 吕佳豪
  • 基金资助:
    国家重点研发计划课题;国家自然科学基金;陕西省自然科学基础研究计划项目

3D path planning method for mobile robots based on deep reinforcement learning

  • Received:2023-06-12 Revised:2023-08-21 Online:2023-09-11 Published:2023-09-11

摘要: 针对三维未知环境中存在的高复杂度和不确定性的问题,提出一种在有限观测空间优化策略下基于深度强化学习的移动机器人三维路径规划方法。首先,在有限观测空间下采用深度图信息作为智能体的输入,模拟移动受限且未知的复杂三维空间环境;其次,设计了两阶段离散动作空间下的动作选择策略,包括方向动作和唯一动作,以减少搜索步数和时间;最后,在近端策略优化算法基础上,添加门控循环单元结合历史状态信息,以提升未知环境中搜索策略的稳定性,进而提高规划路径准确度和平滑度。实验结果表明,相较于A2C(Advantage Actor-Critic)平均搜索时间缩短了49.07%,平均规划路径长度缩短了1.03%,同时能够完成线性时序逻辑约束下的多目标路径规划任务。

关键词: 深度强化学习, 移动机器人, 三维路径规划, 近端策略优化, 深度图

Abstract: Aiming at the problems of high complexity and uncertainty in 3D unknown environment, a mobile robot 3D path planning method based on deep reinforcement learning was proposed, under a limited observation space optimization strategy. First, the depth map information was used as the agent's input in the limited observation space, which could simulate complex 3D space environments with limited and unknown movement conditions. Second, a two-stage action selection policy in discrete action space was designed, including directional actions and translation actions, which could reduce the searching steps and time. Finally, based on the Proximal Policy Optimization algorithm, the Gated Recurrent Unit is added to combine the historical state information, to enhance the policy stability in unknown environments. So that the accuracy and smoothness of the planned path could be improved. The experimental results show that, compared with Advantage Actor-Critic (A2C), the average search time is reduced by 49.07% and the average planned path length is reduced by 1.03%. Meanwhile, it can complete the multi-objective path planning task under linear sequential logic constraints.

Key words: deep reinforcement learning, mobile robot, three-dimensional path planning, Proximal Policy Optimization(PPO), depth map

中图分类号: