基于分层强化学习及人工势场的多Agent路径规划方法

doi:10.11772/j.issn.1001-9081.2015.12.3491

计算机应用 ›› 2015, Vol. 35 ›› Issue (12): 3491-3496.DOI: 10.11772/j.issn.1001-9081.2015.12.3491

基于分层强化学习及人工势场的多Agent路径规划方法

郑延斌^1,2, 李波¹, 安德宇¹, 李娜¹

1. 河南师范大学计算机与信息工程学院, 河南新乡 453007;
2. 智慧商务与物联网技术河南省工程实验室, 河南新乡 453007

收稿日期:2015-06-15 修回日期:2015-07-10 出版日期:2015-12-10 发布日期:2015-12-10
通讯作者: 李波(1989-),男,河南开封人,硕士研究生,主要研究方向:虚拟现实、多智能体系统
作者简介:郑延斌(1964-),男,河南内乡人,教授,博士,主要研究方向:虚拟现实、多智能体系统、对策论;安德宇(1990-),女,河南新乡人,硕士研究生,主要研究方向:虚拟现实、多智能体系统;李娜(1992-),女,河南新乡人,硕士研究生,主要研究方向:虚拟现实。
基金资助:
河南省重点科技攻关项目(132102210537,132102210538)。

Multi-Agent path planning algorithm based on hierarchical reinforcement learning and artificial potential field

ZHENG Yanbin^1,2, LI Bo¹, AN Deyu¹, LI Na¹

1. College of Computer and Information Engineering, Henan Normal University, Xinxiang Henan 453007, China;
2. Henan Engineering Laboratory of Intellectual Business and Internet of Things Technologies, Xinxiang Henan 453007, China

Received:2015-06-15 Revised:2015-07-10 Online:2015-12-10 Published:2015-12-10

摘要/Abstract

摘要： 针对路径规划算法收敛速度慢及效率低的问题,提出了一种基于分层强化学习及人工势场的多Agent路径规划算法。首先,将多Agent的运行环境虚拟为一个人工势能场,根据先验知识确定每点的势能值,它代表最优策略可获得的最大回报;其次,利用分层强化学习方法的无环境模型学习以及局部更新能力将策略更新过程限制在规模较小的局部空间或维度较低的高层空间上,提高学习算法的性能;最后,针对出租车问题在栅格环境中对所提算法进行了仿真实验。为了使算法贴近真实环境,增加算法的可移植性,在三维仿真环境中对该算法进行验证,实验结果表明该算法收敛速度快,收敛过程稳定。

关键词: 路径规划, 多智能体系统, 分层强化学习, 人工势场, 先验知识

Abstract: Aiming at the problems of the path planning algorithm, such as slow convergence and low efficiency, a multi-Agent path planning algorithm based on hierarchical reinforcement learning and artificial potential field was proposed. Firstly, the multi-Agent operating environment was regarded as an artificial potential field, the potential energy of every point, which represented the maximal rewards obtained according to the optimal strategy, was determined by the priori knowledge. Then, the update process of strategy was limited to smaller local space or lower dimension of high-level space to enhance the performance of learning algorithm by using model learning without environment and partial update of hierarchical reinforcement learning. Finally, aiming at the problem of taxi, the simulation experiment of the proposed algorithm was done in grid environment. To close to the real environment and increase the portability of the algorithm, the proposed algorithm was verified in three-dimensional simulation environment. The experimental results show that the convergence speed of the algorithm is fast, and the convergence procedure is stable.

Key words: path planning, Multi-Agent System (MAS), hierarchical reinforcement learning, artificial potential field, priori knowledge

中图分类号:

TP242

郑延斌, 李波, 安德宇, 李娜. 基于分层强化学习及人工势场的多Agent路径规划方法[J]. 计算机应用, 2015, 35(12): 3491-3496.

ZHENG Yanbin, LI Bo, AN Deyu, LI Na. Multi-Agent path planning algorithm based on hierarchical reinforcement learning and artificial potential field[J]. Journal of Computer Applications, 2015, 35(12): 3491-3496.

参考文献

[1] PARKER L E. Multiple mobile robot systems[M]//Springer Handbook of Robotics. Berlin:Springer, 2008:921-941.
[2] CHARKROBORTY J, MUKHOPADHYAY S. A robust cooperative multi-robot path-planning in noisy environment[C]//Proceedings of the 2010 IEEE International Conference on Industrial and Information Systems. Piscataway:IEEE, 2010:626-631.
[3] DAI B, XIAO X, CAI Z. Current status and future development of mobile robot path planning technology[J]. Control Engineering of China, 2005, 12(3):198-202. (戴博,肖晓明,蔡自兴.移动机器人路径规划技术的研究现状与展望[J].控制工程,2005,12(3):198-202.)
[4] SHI L, LUO Q, HAN B, et al. Research in biomimetic experiment of hexapod robot[J]. Journal of System Simulation, 2008, 20(19):5384-5387.
[5] JARADAT M, GARIBEH M H, FEILAT E A. Dynamic motion planning for autonomous mobile robot using fuzzy potential field[C]//Proceedings of the 6th International Symposium on Mechatronics and Its Applications. Piscataway:IEEE, 2009:24-26.
[6] GHATEE M, MOHADES A. Motion planning in order to optimize the length and clearance applying a Hopfield neural network[J]. Expert Systems with Applications, 2009, 36(3):4688-4695.
[7] XU Y, YAO Y. Research on AUV global path planning considering ocean current[J]. Ship Building of China, 2008, 49(4):109-114. (徐玉如,姚耀中.考虑海流影响的水下机器人全局路径规划研究[J].中国造船,2008,49(4):109-114.)
[8] HAO D, LIU B. Behavior fusion path planning method for mobile robot based in fuzzy logic[J]. Computer Engineering and Design, 2009, 30(3):660-663. (郝冬,刘斌.基于模糊逻辑行为融合路径规划方法[J].计算机工程与设计,2009, 30(3):660-663.)
[9] SONG Y, LI Y, LI C. Initialization in reinforcement learning for mobile robots path planning[J]. Control Theory & Applications, 2012, 29(12):1623-1628. (宋勇,李贻斌,李彩虹.移动机器人路径规划强化学习的初始化[J].控制理论与应用,2012,29(12):1623-1628.)
[10] BARTO A G, MAHADEVEN S. Recent advance in hierarchical reinforcement learning[J]. Discrete Event Dynamic Systems, 2003, 13(4):341-379.
[11] SABATTIN L, SECCHI C, FANTUZZI C. Arbitrarily shaped formations of mobile robots:artificial potential fields and coordinate transformation[J]. Autonomous Robots, 2011, 30(4):385-397.
[12] KHATIB O. Real-time obstacle avoidance for manipulators and mobile robots[C]//Proceedings of the 1985 IEEE International Conference on Robotics and Automation. Piscataway:IEEE, 1985, 2:500-505.
[13] LIANG T. A speedup convergent method for multi-Agent reinforcement learning[C]//Proceedings of the 2009 International Conference on Information Engineering and Computer Science. Piscataway:IEEE, 2009:1-4.
[14] SUTTON R S, PRECUP D, SINGH S P. Between MDPs and semi-MDPs:a framework for temporal abstraction in reinforcement learning[J]. Artificial Intelligence, 1999, 112(1/2):181-211.
[15] PARR R. Hierarchical control and learning for Markov decision processes[D]. Berkeley:University of California, 1998:17-109.
[16] DIETTERICH T G. Hierarchical reinforcement learning with the MAXQ value function decomposition[J]. Journal of Artificial Intelligence Research, 2000, 13(1):227-303.
[17] SHEN J, LIU H, ZHANG R, et al. Multi-robot hierarchical reinforcement learning based on semi-Markov games[J]. Journal of Shandong University:Engineering Science, 2010, 40(4):1-7. (沈晶,刘海波,张汝波,等.基于半马尔可夫对策的多机器人分层强化学习[J].山东大学学报:工学版,2010,40(4):1-7.)
[18] SINGH S P, JAAKKOLA T, LITTMAN M L, et al. Convergence results for single step on policy reinforcement learning algorithm[J]. Machine Learning, 2000, 38(3):287-308.

基于分层强化学习及人工势场的多Agent路径规划方法

Multi-Agent path planning algorithm based on hierarchical reinforcement learning and artificial potential field

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	李开荣, 刘爽, 胡倩倩, 唐亦媛. 基于转角约束的改进蚁群优化算法路径规划[J]. 计算机应用, 2021, 41(9): 2560-2568.
[2]	王云燕, 胡爱花. 网络攻击下双层结构多智能体系统一致性[J]. 计算机应用, 2021, 41(5): 1399-1405.
[3]	黄书召, 田军委, 乔路, 王沁, 苏宇. 基于改进遗传算法的无人机路径规划[J]. 计算机应用, 2021, 41(2): 390-397.
[4]	王家亮, 李树华, 张海涛. 基于贝叶斯估计与区域划分遍历的四轴飞行器避障路径规划算法[J]. 计算机应用, 2021, 41(2): 384-389.
[5]	魏博, 杨茸, 舒思豪, 万勇, 苗建国. 基于离子运动-人工蜂群算法的移动机器人路径规划[J]. 计算机应用, 2021, 41(2): 379-383.
[6]	李伟, 金世俊. 基于人工势场法和启发式采样的最优路径收敛方法[J]. 计算机应用, 2021, 41(10): 2912-2918.
[7]	姚兴虎, 谭晓阳. 基于奖励高速路网络的多智能体强化学习中的全局信用分配算法[J]. 计算机应用, 2021, 41(1): 1-7.
[8]	刘思嘉, 童向荣. 基于强化学习的城市交通路径规划[J]. 计算机应用, 2021, 41(1): 185-190.
[9]	徐晓强, 秦品乐, 曾建朝. 基于改进粒子群优化算法的牙齿正畸路径规划方法[J]. 计算机应用, 2020, 40(7): 1938-1943.
[10]	祁玄玄, 黄家骏, 曹建安. 基于改进A^*算法的无人车路径规划[J]. 计算机应用, 2020, 40(7): 2021-2027.
[11]	韩雨涝, 房鼎益. 联合无线充电与数据收集的移动充电装置多目标路径规划算法[J]. 计算机应用, 2020, 40(6): 1745-1750.
[12]	屈立成, 吕娇, 赵明, 王海飞, 屈艺华. 基于三维时空地图和运动分解的多机器人路径规划算法[J]. 计算机应用, 2020, 40(12): 3499-3507.
[13]	许长青, 陈振杰, 侯仁福. 融合不精准先验知识的Landsat 8 OLI影像深度学习分类方法[J]. 计算机应用, 2020, 40(12): 3550-3557.
[14]	徐小强, 王明勇, 冒燕. 基于改进人工势场法的移动机器人路径规划[J]. 计算机应用, 2020, 40(12): 3508-3512.
[15]	刘昂, 蒋近, 徐克锋. 改进蚁群和鸽群算法的机器人路径规划[J]. 计算机应用, 2020, 40(11): 3366-3372.