《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (2): 654-660.DOI: 10.11772/j.issn.1001-9081.2021122053

• 前沿与综合应用 • 上一篇    

基于改进SAC算法的移动机器人路径规划

李永迪, 李彩虹(), 张耀玉, 张国胜   

  1. 山东理工大学 计算机科学与技术学院,山东 淄博 255049
  • 收稿日期:2021-12-09 修回日期:2022-02-28 接受日期:2022-03-07 发布日期:2023-02-08 出版日期:2023-02-10
  • 通讯作者: 李彩虹
  • 作者简介:李永迪(1996—),男,山东淄博人,硕士研究生,主要研究方向:检测与控制
    张耀玉(1998—),女,山东潍坊人,硕士研究生,主要研究方向:检测与控制
    张国胜(1997—),男,山东滕州人,硕士研究生,主要研究方向:检测与控制。
  • 基金资助:
    国家自然科学基金资助项目(61473179)

Mobile robot path planning based on improved SAC algorithm

Yongdi LI, Caihong LI(), Yaoyu ZHANG, Guosheng ZHANG   

  1. School of Computer Science and Technology,Shandong University of Technology,Zibo Shandong 255049,China
  • Received:2021-12-09 Revised:2022-02-28 Accepted:2022-03-07 Online:2023-02-08 Published:2023-02-10
  • Contact: Caihong LI
  • About author:LI Yongdi, born in 1996, M. S. candidate. His research interests include detection and control.
    ZHANG Yaoyu, born in 1998, M. S. candidate. His research interests include detection and control.
    ZHANG Guosheng, born in 1997, M. S. candidate. His research interests include detection and control.
  • Supported by:
    National Natural Science Foundation of China(61473179)

摘要:

为解决SAC算法在移动机器人局部路径规划中训练时间长、收敛速度慢等问题,通过引入优先级经验回放(PER)技术,提出了PER-SAC算法。首先从等概率从经验池中随机抽取样本变为按优先级抽取,使网络优先训练误差较大的样本,从而提高了机器人训练过程的收敛速度和稳定性;其次优化时序差分(TD)误差的计算,以降低训练偏差;然后利用迁移学习,使机器人从简单环境到复杂环境逐步训练,从而提高训练速度;另外,设计了改进的奖励函数,增加机器人的内在奖励,从而解决了环境奖励稀疏的问题;最后在ROS平台上进行仿真测试。仿真结果表明,在不同的障碍物环境中,PER-SAC算法均比原始算法收敛速度更快、规划的路径长度更短,并且PER-SAC算法能够减少训练时间,在路径规划性能上明显优于原始算法。

关键词: 移动机器人, 局部路径规划, SAC算法, 优先级经验回放, ROS平台

Abstract:

To solve the long training time and slow convergence problems when applying SAC (Soft Actor-Critic) algorithm to the local path planning of mobile robots, a PER-SAC algorithm was proposed by introducing the Prioritized Experience Replay (PER) technique. Firstly, to improve the convergence speed and stability of the robot training process, a priority strategy was applied to extract samples from the experience pool instead of the traditional random sampling and the network prioritized the training of samples with larger errors. Then, the calculation of Temporal-Difference (TD) error was optimized, and the training deviation was reduced. Next, the transfer learning was used to train the robot from a simple environment to a complex one gradually in order to improve the training speed. In addition, an improved reward function was designed to increase the intrinsic reward of robots, and therefore, the sparsity problem of environmental reward was solved. Finally, the simulation was carried out on the ROS (Robot Operating System) platform, and the simulation results show that PER-SAC algorithm outperforms the original algorithm in terms of convergence speed and length of the planned path in different obstacle environments. Moreover, the PER-SAC algorithm can reduce the training time and is significantly better than the original algorithm on path planning performance.

Key words: mobile robot, local path planning, Soft Actor-Critic (SAC) algorithm, Prioritized Experience Replay (PER), ROS (Robot Operating System) platform

中图分类号: