Mobile robot path planning based on improved SAC algorithm

doi:10.11772/j.issn.1001-9081.2021122053

Abstract

Abstract:

To solve the long training time and slow convergence problems when applying SAC （Soft Actor-Critic） algorithm to the local path planning of mobile robots， a PER-SAC algorithm was proposed by introducing the Prioritized Experience Replay （PER） technique. Firstly， to improve the convergence speed and stability of the robot training process， a priority strategy was applied to extract samples from the experience pool instead of the traditional random sampling and the network prioritized the training of samples with larger errors. Then， the calculation of Temporal-Difference （TD） error was optimized， and the training deviation was reduced. Next， the transfer learning was used to train the robot from a simple environment to a complex one gradually in order to improve the training speed. In addition， an improved reward function was designed to increase the intrinsic reward of robots， and therefore， the sparsity problem of environmental reward was solved. Finally， the simulation was carried out on the ROS （Robot Operating System） platform， and the simulation results show that PER-SAC algorithm outperforms the original algorithm in terms of convergence speed and length of the planned path in different obstacle environments. Moreover， the PER-SAC algorithm can reduce the training time and is significantly better than the original algorithm on path planning performance.

Key words: mobile robot, local path planning, Soft Actor-Critic (SAC) algorithm, Prioritized Experience Replay (PER), ROS (Robot Operating System) platform

摘要：

为解决SAC算法在移动机器人局部路径规划中训练时间长、收敛速度慢等问题，通过引入优先级经验回放（PER）技术，提出了PER-SAC算法。首先从等概率从经验池中随机抽取样本变为按优先级抽取，使网络优先训练误差较大的样本，从而提高了机器人训练过程的收敛速度和稳定性；其次优化时序差分（TD）误差的计算，以降低训练偏差；然后利用迁移学习，使机器人从简单环境到复杂环境逐步训练，从而提高训练速度；另外，设计了改进的奖励函数，增加机器人的内在奖励，从而解决了环境奖励稀疏的问题；最后在ROS平台上进行仿真测试。仿真结果表明，在不同的障碍物环境中，PER-SAC算法均比原始算法收敛速度更快、规划的路径长度更短，并且PER-SAC算法能够减少训练时间，在路径规划性能上明显优于原始算法。

关键词: 移动机器人, 局部路径规划, SAC算法, 优先级经验回放, ROS平台

CLC Number:

TP273

Yongdi LI, Caihong LI, Yaoyu ZHANG, Guosheng ZHANG. Mobile robot path planning based on improved SAC algorithm[J]. Journal of Computer Applications, 2023, 43(2): 654-660.

李永迪, 李彩虹, 张耀玉, 张国胜. 基于改进SAC算法的移动机器人路径规划[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 654-660.

Figures/Tables 23

Fig. 1 Network framework of SAC algorithm

Fig. 2 Network input and output

Fig. 3 SAC network structure

Fig. 4 Lidar data acquisition structure

Fig. 5 Transfer learning structure

Tab. 1 Simulation parameter setting

参数	数值	参数	数值
学习率	0.000 3	每轮学习经验数量	256
折扣系数	0.99	训练轮数	200
隐藏层神经元个数	512	每轮训练步数最大值	500
经验池容量	50 000

Fig. 6 Obstacle-free simulation environment

Fig. 7 Comparison of average reward per round of obstacle-free environment

Fig. 8 Path planning in obstacle-free environment

Fig. 9 Discrete obstacle simulation environment

Fig. 10 Comparison of average reward per round in discrete obstacle environment

Fig. 11 Path planning in discrete obstacle environment

Fig. 12 U-shaped obstacle simulation environment

Fig. 13 Comparison of average reward per round in U-shaped obstacle environment

Fig. 14 Path planning in U-shaped obstacle environment

Fig. 15 1-shaped obstacle simulation environment

Fig. 16 Path planning in one-shaped obstacle environment

Fig. 17 The first mixed obstacle environment

Fig. 18 Path planning in the first mixed obstacle environment

Fig. 19 The second mixed obstacle environment

Fig. 20 Path planning in the second mixed obstacle environment

Tab. 2 Algorithm convergence time

障碍物类型	开始收敛轮数		稳定收敛轮数
障碍物类型	PER-SAC算法	SAC算法	PER-SAC算法	SAC算法
无障碍物	30	60	65	85
离散障碍物	35	60	110	130
U型障碍物	50	90	135	180

Tab. 3 Number of steps reaching target

障碍物类型	PER-SAC算法	SAC算法
无障碍物	115	118
离散障碍物	248	257
U型障碍物	274	298
一型障碍物	183	226
混合障碍物一	271	304
混合障碍物二	279	310

References 17

1	郭凯红，李博昊，宗晓瑞，等. 基于改进人工势场的避障路径规划策略研究［J］. 舰船科学技术， 2021， 43（7）： 54-57.
	GUO K H， LI B H， ZONG X R， et al. Research on path planning based on improved artificial potential field［J］. Ship Science and Technology， 2021， 43（7）： 54-57.
2	汪四新，谭功全，蒋沁，等. 基于改进A*算法的移动机器人路径规划［J］. 计算机仿真， 2021， 38（9）： 386-389， 404. 10.3969/j.issn.1006-9348.2021.09.077
	WANG S X， TAN G Q， JIANG Q， et al. Path planning for mobile robot based on improved A* algorithm［J］. Computer Simulation， 2021， 38 （9）： 386-389， 404. 10.3969/j.issn.1006-9348.2021.09.077
3	GURUJI A K， AGARWAL H， PARSEDIYA D K. Time efficient A* algorithm for robot path planning［J］. Procedia Technology， 2016， 23： 144-149. 10.1016/j.protcy.2016.03.010
4	张松灿，普杰信，司彦娜，等. 蚁群算法在移动机器人路径规划中的应用综述［J］. 计算机工程与应用， 2020， 56（8）：10-19.
	ZHANG S C， PU J X， SI Y N， et al. Survey on application of ant colony algorithm in path planning of mobile robot［J］. Computer Engineering and Applications， 2020， 56（8）： 10-19.
5	MIRJALILI S， DONG J S， LEWIS A. Ant colony optimizer： theory， literature review， and application in AUV path planning： methods and applications［M］// MIRJALILI S， DONG J S， LEWIS A.Nature-Inspired Optimizers： Theories， Literature Reviews and Applications， SCI 811. Cham： Springer， 2020： 7-21. 10.1007/978-3-030-12127-3_2
6	李少波，宋启松，李志昂，等. 遗传算法在机器人路径规划中的研究综述［J］. 科学技术与工程， 2020， 20（2）：423-431. 10.3969/j.issn.1671-1815.2020.02.001
	LI S B， SONG Q S， LI Z A， et al. Review of genetic algorithm in robot path planning［J］. Science Technology and Engineering， 2020， 20（2）： 423-431. 10.3969/j.issn.1671-1815.2020.02.001
7	吕柏行，郭志光，赵韦皓，等. 标准粒子群算法的优化方式综述［J］. 科学技术创新， 2021（28）：33-37. 10.3969/j.issn.1673-1328.2021.28.013
	LYU B X， GUO Z G， ZHAO W H， et al. A review on optimization methods of standard particle swarm optimization［J］. Scientific and Technological Innovation， 2021（28）： 33-37. 10.3969/j.issn.1673-1328.2021.28.013
8	王彩霞，林寿英. 基于深度学习的多目标跟踪算法综述［J］. 中阿科技论坛（中英文）， 2021（10）：118-120. 10.1109/cac53003.2021.9727659
	WANG C X， LIN S Y. A survey on multi-target tracking algorithm based on deep learning［J］. China-Arab States Science and Technology Forum， 2021（10）： 118-120. 10.1109/cac53003.2021.9727659
9	李茹杨，彭慧民，李仁刚，等. 强化学习算法与应用综述［J］. 计算机系统应用， 2020， 29（12）：13-25.
	LI R Y， PENG H M， LI R G， et al. Overview on algorithms and applications for reinforcement learning［J］. Computer Systems and Applications， 2020， 29（12）： 13-25.
10	张荣霞，武长旭，孙同超，等. 深度强化学习及在路径规划中的研究进展［J］. 计算机工程与应用， 2021， 57（19）：44-56.
	ZHANG R X， WU C X， SUN T C， et al. Progress on deep reinforcement learning in path planning［J］. Computer Engineering and Applications， 2021， 57（19）： 44-56.
11	HAARNOJA T， ZHOU A， ABBEEL P， et al. Soft actor-critic： off-policy maximum entropy deep reinforcement learning with a stochastic actor［C］// Proceedings of the 35th International Conference on Machine Learning. New York： JMLR.org， 2018： 1861-1870. 10.1109/icra.2018.8460756
12	HAARNOJA T， ZHOU A， HARTIKAINEN K， et al. Soft actor-critic algorithms and applications［EB/OL］. （2019-01-29）［2020-06-28］..
13	DE JESUS J C， KICH V A， KOLLING A H， et al. Soft actor-critic for navigation of mobile robots［J］. Journal of Intelligent and Robotic Systems， 2021， 102（2）： No.31. 10.1007/s10846-021-01367-5
14	肖硕，黄珍珍，张国鹏，等. 基于SAC的多智能体深度强化学习算法［J］. 电子学报， 2021， 49（9）：1675-1681. 10.12263/DZXB.20200243
	XIAO S， HUANG Z Z， ZHANG G P， et al. Deep reinforcement learning algorithm of multi-agent based on SAC［J］. Acta Electronica Sinica， 2021， 49（9）： 1675-1681. 10.12263/DZXB.20200243
15	单麒源，张智豪，张耀心，等. 基于SAC算法的矿山应急救援智能车快速避障控制［J］. 黑龙江科技大学学报， 2021， 31（1）：14-20. 10.3969/j.issn.2095-7262.2021.01.003
	SHAN Q Y， ZHANG Z H， ZHANG Y X， et al. High speed obstacle avoidance control of mine emergency rescue intelligent vehicle based on SAC algorithm［J］. Journal of Heilongjiang University of Science and Technology， 2021， 31（1）： 14-20. 10.3969/j.issn.2095-7262.2021.01.003
16	胡仕柯，赵海军. 基于改进柔性演员评论家算法的研究［J］. 太原师范学院学报（自然科学版）， 2021， 20（3）：48-52.
	HU S K， ZHAO H J. Research on algorithm based on improved soft actor-critic［J］. Journal of Taiyuan Normal University （Natural Science Edition）， 2021， 20（3）： 48-52.
17	姜玉斌，刘全，胡智慧. 带最大熵修正的行动者评论家算法［J］. 计算机学报， 2020， 43（10）：1897-1908. 10.11897/SP.J.1016.2020.01897
	JIANG Y B， LIU Q， HU Z H. Actor-critic algorithm with maximum-entropy correction［J］. Chinese Journal of Computers， 2020， 43（10）： 1897-1908. 10.11897/SP.J.1016.2020.01897

[1]	Ying HU, Zhihuan CHEN. Trajectory tracking control of wheeled mobile robots under side-slip and slip [J]. Journal of Computer Applications, 2024, 44(7): 2294-2300.
[2]	Tian MA, Runtao XI, Jiahao LYU, Yijie ZENG, Jiayi YANG, Jiehui ZHANG. Mobile robot 3D space path planning method based on deep reinforcement learning [J]. Journal of Computer Applications, 2024, 44(7): 2055-2064.
[3]	Ziyang SONG, Junhuai LI, Huaijun WANG, Xin SU, Lei YU. Path planning algorithm of manipulator based on path imitation and SAC reinforcement learning [J]. Journal of Computer Applications, 2024, 44(2): 439-444.
[4]	Yuanchao LI, Chongben TAO, Chen WANG. Gait control method based on maximum entropy deep reinforcement learning for biped robot [J]. Journal of Computer Applications, 2024, 44(2): 445-451.
[5]	Enbao QIAO, Xiangyang GAO, Jun CHENG. Self-recovery adaptive Monte Carlo localization algorithm based on support vector machine [J]. Journal of Computer Applications, 2024, 44(10): 3246-3251.
[6]	Dongying ZHU, Yong ZHONG, Guanci YANG, Yang LI. Research progress on motion segmentation of visual localization and mapping in dynamic environment [J]. Journal of Computer Applications, 2023, 43(8): 2537-2545.
[7]	Bingqi SHEN, Zhiming ZHANG, Shaolong SHU. UWB-VIO integrated indoor positioning algorithm for mobile robots [J]. Journal of Computer Applications, 2022, 42(12): 3924-3930.
[8]	LI Kairong, LIU Shuang, HU Qianqian, TANG Yiyuan. Improved ant colony optimization algorithm for path planning based on turning angle constraint [J]. Journal of Computer Applications, 2021, 41(9): 2560-2568.
[9]	LU Guoqing, SUN Hao. Two-dimensional mapping of swarm robot based on random walk [J]. Journal of Computer Applications, 2021, 41(7): 2121-2127.
[10]	LUAN Jianing, ZHANG Wei, SUN Wei, ZHANG Ao, HAN Dong. High-accuracy localization algorithm based on fusion of two-dimensional code vision and laser lidar [J]. Journal of Computer Applications, 2021, 41(5): 1484-1491.
[11]	Erchao LI, Kuankuan QI. Robot path planning based on B-spline curve and ant colony algorithm [J]. Journal of Computer Applications, 2021, 41(12): 3558-3564.
[12]	LIU Ang, JIANG Jin, XU Kefeng. Robot path planning based on improved ant colony and pigeon inspired optimization algorithm [J]. Journal of Computer Applications, 2020, 40(11): 3366-3372.
[13]	WANG Kun, ZENG Guohui, LU Dunke, HUANG Bo, LI Xiaobin. Path planning of mobile robot based on improved asymptotically-optimal bidirectional rapidly-exploring random tree algorithm [J]. Journal of Computer Applications, 2019, 39(5): 1312-1317.
[14]	CHEN Ruonan, WEN Congcong, PENG Ling, YOU Chengzeng. Application of improved A^* algorithm in indoor path planning for mobile robot [J]. Journal of Computer Applications, 2019, 39(4): 1006-1011.
[15]	HUANG Chao, LIANG Shengtao, ZHANG Yi, ZHANG Jie. Path planning of mobile robot based on multi-objective grasshopper optimization algorithm [J]. Journal of Computer Applications, 2019, 39(10): 2859-2864.