Multi-Agent path planning algorithm based on hierarchical reinforcement learning and artificial potential field

doi:10.11772/j.issn.1001-9081.2015.12.3491

Journal of Computer Applications ›› 2015, Vol. 35 ›› Issue (12): 3491-3496.DOI: 10.11772/j.issn.1001-9081.2015.12.3491

• Artificial intelligence • Previous Articles Next Articles

Multi-Agent path planning algorithm based on hierarchical reinforcement learning and artificial potential field

ZHENG Yanbin^1,2, LI Bo¹, AN Deyu¹, LI Na¹

1. College of Computer and Information Engineering, Henan Normal University, Xinxiang Henan 453007, China;
2. Henan Engineering Laboratory of Intellectual Business and Internet of Things Technologies, Xinxiang Henan 453007, China

Received:2015-06-15 Revised:2015-07-10 Online:2015-12-10 Published:2015-12-10

基于分层强化学习及人工势场的多Agent路径规划方法

郑延斌^1,2, 李波¹, 安德宇¹, 李娜¹

1. 河南师范大学计算机与信息工程学院, 河南新乡 453007;
2. 智慧商务与物联网技术河南省工程实验室, 河南新乡 453007

通讯作者: 李波(1989-),男,河南开封人,硕士研究生,主要研究方向:虚拟现实、多智能体系统
作者简介:郑延斌(1964-),男,河南内乡人,教授,博士,主要研究方向:虚拟现实、多智能体系统、对策论;安德宇(1990-),女,河南新乡人,硕士研究生,主要研究方向:虚拟现实、多智能体系统;李娜(1992-),女,河南新乡人,硕士研究生,主要研究方向:虚拟现实。
基金资助:
河南省重点科技攻关项目(132102210537,132102210538)。

Abstract

Abstract: Aiming at the problems of the path planning algorithm, such as slow convergence and low efficiency, a multi-Agent path planning algorithm based on hierarchical reinforcement learning and artificial potential field was proposed. Firstly, the multi-Agent operating environment was regarded as an artificial potential field, the potential energy of every point, which represented the maximal rewards obtained according to the optimal strategy, was determined by the priori knowledge. Then, the update process of strategy was limited to smaller local space or lower dimension of high-level space to enhance the performance of learning algorithm by using model learning without environment and partial update of hierarchical reinforcement learning. Finally, aiming at the problem of taxi, the simulation experiment of the proposed algorithm was done in grid environment. To close to the real environment and increase the portability of the algorithm, the proposed algorithm was verified in three-dimensional simulation environment. The experimental results show that the convergence speed of the algorithm is fast, and the convergence procedure is stable.

Key words: path planning, Multi-Agent System (MAS), hierarchical reinforcement learning, artificial potential field, priori knowledge

摘要： 针对路径规划算法收敛速度慢及效率低的问题,提出了一种基于分层强化学习及人工势场的多Agent路径规划算法。首先,将多Agent的运行环境虚拟为一个人工势能场,根据先验知识确定每点的势能值,它代表最优策略可获得的最大回报;其次,利用分层强化学习方法的无环境模型学习以及局部更新能力将策略更新过程限制在规模较小的局部空间或维度较低的高层空间上,提高学习算法的性能;最后,针对出租车问题在栅格环境中对所提算法进行了仿真实验。为了使算法贴近真实环境,增加算法的可移植性,在三维仿真环境中对该算法进行验证,实验结果表明该算法收敛速度快,收敛过程稳定。

关键词: 路径规划, 多智能体系统, 分层强化学习, 人工势场, 先验知识

CLC Number:

TP242

ZHENG Yanbin, LI Bo, AN Deyu, LI Na. Multi-Agent path planning algorithm based on hierarchical reinforcement learning and artificial potential field[J]. Journal of Computer Applications, 2015, 35(12): 3491-3496.

郑延斌, 李波, 安德宇, 李娜. 基于分层强化学习及人工势场的多Agent路径规划方法[J]. 计算机应用, 2015, 35(12): 3491-3496.

References

[1] PARKER L E. Multiple mobile robot systems[M]//Springer Handbook of Robotics. Berlin:Springer, 2008:921-941.
[2] CHARKROBORTY J, MUKHOPADHYAY S. A robust cooperative multi-robot path-planning in noisy environment[C]//Proceedings of the 2010 IEEE International Conference on Industrial and Information Systems. Piscataway:IEEE, 2010:626-631.
[3] DAI B, XIAO X, CAI Z. Current status and future development of mobile robot path planning technology[J]. Control Engineering of China, 2005, 12(3):198-202. (戴博,肖晓明,蔡自兴.移动机器人路径规划技术的研究现状与展望[J].控制工程,2005,12(3):198-202.)
[4] SHI L, LUO Q, HAN B, et al. Research in biomimetic experiment of hexapod robot[J]. Journal of System Simulation, 2008, 20(19):5384-5387.
[5] JARADAT M, GARIBEH M H, FEILAT E A. Dynamic motion planning for autonomous mobile robot using fuzzy potential field[C]//Proceedings of the 6th International Symposium on Mechatronics and Its Applications. Piscataway:IEEE, 2009:24-26.
[6] GHATEE M, MOHADES A. Motion planning in order to optimize the length and clearance applying a Hopfield neural network[J]. Expert Systems with Applications, 2009, 36(3):4688-4695.
[7] XU Y, YAO Y. Research on AUV global path planning considering ocean current[J]. Ship Building of China, 2008, 49(4):109-114. (徐玉如,姚耀中.考虑海流影响的水下机器人全局路径规划研究[J].中国造船,2008,49(4):109-114.)
[8] HAO D, LIU B. Behavior fusion path planning method for mobile robot based in fuzzy logic[J]. Computer Engineering and Design, 2009, 30(3):660-663. (郝冬,刘斌.基于模糊逻辑行为融合路径规划方法[J].计算机工程与设计,2009, 30(3):660-663.)
[9] SONG Y, LI Y, LI C. Initialization in reinforcement learning for mobile robots path planning[J]. Control Theory & Applications, 2012, 29(12):1623-1628. (宋勇,李贻斌,李彩虹.移动机器人路径规划强化学习的初始化[J].控制理论与应用,2012,29(12):1623-1628.)
[10] BARTO A G, MAHADEVEN S. Recent advance in hierarchical reinforcement learning[J]. Discrete Event Dynamic Systems, 2003, 13(4):341-379.
[11] SABATTIN L, SECCHI C, FANTUZZI C. Arbitrarily shaped formations of mobile robots:artificial potential fields and coordinate transformation[J]. Autonomous Robots, 2011, 30(4):385-397.
[12] KHATIB O. Real-time obstacle avoidance for manipulators and mobile robots[C]//Proceedings of the 1985 IEEE International Conference on Robotics and Automation. Piscataway:IEEE, 1985, 2:500-505.
[13] LIANG T. A speedup convergent method for multi-Agent reinforcement learning[C]//Proceedings of the 2009 International Conference on Information Engineering and Computer Science. Piscataway:IEEE, 2009:1-4.
[14] SUTTON R S, PRECUP D, SINGH S P. Between MDPs and semi-MDPs:a framework for temporal abstraction in reinforcement learning[J]. Artificial Intelligence, 1999, 112(1/2):181-211.
[15] PARR R. Hierarchical control and learning for Markov decision processes[D]. Berkeley:University of California, 1998:17-109.
[16] DIETTERICH T G. Hierarchical reinforcement learning with the MAXQ value function decomposition[J]. Journal of Artificial Intelligence Research, 2000, 13(1):227-303.
[17] SHEN J, LIU H, ZHANG R, et al. Multi-robot hierarchical reinforcement learning based on semi-Markov games[J]. Journal of Shandong University:Engineering Science, 2010, 40(4):1-7. (沈晶,刘海波,张汝波,等.基于半马尔可夫对策的多机器人分层强化学习[J].山东大学学报:工学版,2010,40(4):1-7.)
[18] SINGH S P, JAAKKOLA T, LITTMAN M L, et al. Convergence results for single step on policy reinforcement learning algorithm[J]. Machine Learning, 2000, 38(3):287-308.

Multi-Agent path planning algorithm based on hierarchical reinforcement learning and artificial potential field

基于分层强化学习及人工势场的多Agent路径规划方法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

[1]	LI Kairong, LIU Shuang, HU Qianqian, TANG Yiyuan. Improved ant colony optimization algorithm for path planning based on turning angle constraint [J]. Journal of Computer Applications, 2021, 41(9): 2560-2568.
[2]	TANG Andi, HAN Tong, XU Dengwu, XIE Lei. Path planning method of unmanned aerial vehicle based on chaos sparrow search algorithm [J]. Journal of Computer Applications, 2021, 41(7): 2128-2136.
[3]	ZHANG Kang, CHEN Jianping. Path planning algorithm in complex environment using self-adjusting sampling space [J]. Journal of Computer Applications, 2021, 41(4): 1207-1213.
[4]	HUANG Shuzhao, TIAN Junwei, QIAO Lu, WANG Qin, SU Yu. Unmanned aerial vehicle path planning based on improved genetic algorithm [J]. Journal of Computer Applications, 2021, 41(2): 390-397.
[5]	WANG Jialiang, LI Shuhua, ZHANG Haitao. Obstacle avoidance path planning algorithm of quad-rotor helicopter based on Bayesian estimation and region division traversal [J]. Journal of Computer Applications, 2021, 41(2): 384-389.
[6]	WEI Bo, YANG Rong, SHU Sihao, WAN Yong, MIAO Jianguo. Path planning of mobile robots based on ion motion-artificial bee colony algorithm [J]. Journal of Computer Applications, 2021, 41(2): 379-383.
[7]	LI Wei, JIN Shijun. Optimal path convergence method based on artificial potential field method and informed sampling [J]. Journal of Computer Applications, 2021, 41(10): 2912-2918.
[8]	LIU Sijia, TONG Xiangrong. Urban transportation path planning based on reinforcement learning [J]. Journal of Computer Applications, 2021, 41(1): 185-190.
[9]	XU Xiaoqiang, QIN Pinle, ZENG Jianchao. Orthodontic path planning based on improved particle swarm optimization algorithm [J]. Journal of Computer Applications, 2020, 40(7): 1938-1943.
[10]	QI Xuanxuan, HUANG Jiajun, CAO Jian'an. Path planning for unmanned vehicle based on improved A^* algorithm [J]. Journal of Computer Applications, 2020, 40(7): 2021-2027.
[11]	HAN Yulao, FANG Dingyi. Multi-objective path planning algorithm for mobile charging devices jointing wireless charging and data collection [J]. Journal of Computer Applications, 2020, 40(6): 1745-1750.
[12]	QU Licheng, LYU Jiao, ZHAO Ming, WANG Haifei, QU Yihua. Multi-robot path planning algorithm based on 3D spatiotemporal maps and motion decomposition [J]. Journal of Computer Applications, 2020, 40(12): 3499-3507.
[13]	XU Xiaoqiang, WANG Mingyong, MAO Yan. Path planning of mobile robot based on improved artificial potential field method [J]. Journal of Computer Applications, 2020, 40(12): 3508-3512.
[14]	TONG Xinchi, ZHANG Huajun, GUO Hang. Multi-directional path planning algorithm for unmanned surface vehicle [J]. Journal of Computer Applications, 2020, 40(11): 3373-3378.
[15]	LIU Ang, JIANG Jin, XU Kefeng. Robot path planning based on improved ant colony and pigeon inspired optimization algorithm [J]. Journal of Computer Applications, 2020, 40(11): 3366-3372.