Urban transportation path planning based on reinforcement learning

doi:10.11772/j.issn.1001-9081.2020060949

Abstract

Abstract: For urban transportation path planning issue, the speed of planning and the safety of vehicles in the path needed to be considered, but most existing reinforcement learning algorithms cannot consider both of them. Aiming at this problem, the following steps were carried out. First, a Dyna framework with the combination of model-based and model-independent algorithms was proposed, so as to improve the speed of planning. Then, the classical Sarsa algorithm was used as a route selection strategy in order to improve the safety of the algorithm. Finally, the above two were combined and an improved Sarsa-based algorithm called Dyna-Sa was proposed. Experimental results show that the reinforcement learning algorithm converges faster with more planning steps in advance. Compared with Q-learning, Sarsa and Dyna-Q algorithms through metrics such as convergence speed and number of collisions, it can be seen that the Dyna-Sa algorithm not only reduces the number of collisions in the map with obstacles, ensures the safety of vehicles in the urban traffic environment, but also accelerates the algorithm convergence.

Key words: path planning, urban transportation, reinforcement learning, Dyna framework, Sarsa algorithm

摘要： 城市交通路径规划需要考虑规划的快速性和车辆的安全性，而目前大多数强化学习算法不能兼顾两者。针对这个问题，首先提出采用基于模型的算法和与模型无关的算法相结合的Dyna框架，以提高规划的速度；然后使用经典的Sarsa算法作为选路策略，以提高算法的安全性；最后将两者结合提出了改进的基于Sarsa的Dyna-Sa算法。实验结果表明，提前规划步数越多的强化学习算法收敛速度越快。使用收敛速度和碰撞次数等指标，将Dyna-Sa算法与Q-学习算法、Sarsa算法和Dyna-Q算法进行对比，可知Dyna-Sa算法能够减少车辆在有障碍地图中的碰撞次数，保证车辆在城市交通环境中的安全性，同时能够提高算法收敛速度。

关键词: 路径规划, 城市交通, 强化学习, Dyna框架, Sarsa算法

CLC Number:

TP181

LIU Sijia, TONG Xiangrong. Urban transportation path planning based on reinforcement learning[J]. Journal of Computer Applications, 2021, 41(1): 185-190.

刘思嘉, 童向荣. 基于强化学习的城市交通路径规划[J]. 计算机应用, 2021, 41(1): 185-190.

References

[1] HART P E,NILSSON N J,RAPHAEL B. A formal basis for the heuristic determination of minimum cost paths[J]. IEEE Transactions on Systems Science and Cybernetics,1968,4(2):100-107.
[2] YU Z,YU X,KOUDAS N,et al. Distributed processing of k shortest path queries over dynamic road networks[C]//Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. New York:ACM,2020:665-679.
[3] GLASIUS R,KOMODA A,GIELEN S C A M. Neural network dynamics for path planning and obstacle avoidance[J]. Neural Networks,1995,8(1):125-133.
[4] 孟宪权, 赵英男, 薛青. 遗传算法在路径规划中的应用[J]. 计算机工程,2008,34(16):215-217,220.(MENG X Q,ZHAO Y N, XUE Q. Application of genetic algorithm in path planning[J]. Computer Engineering,2008,34(16):215-217,220.)
[5] SUTTON R S, BARTO A G. Reinforcement Learning:An Introduction[M]. Cambridge:MIT Press, 2017:140-141, 167-172.
[6] LAMINI C,FATHI Y,BENHLIMA S. H-MAS architecture and reinforcement learning method for autonomous robot path planning[C]//Proceedings of the 2017 Intelligent Systems and Computer Vision. Piscataway:IEEE,2017:1-7.
[7] 乔俊飞, 侯占军, 阮晓钢. 基于神经网络的强化学习在避障中的应用[J]. 清华大学学报(自然科学版),2008,48(S2):1747-1750.(QIAO J F,HOU Z J,RUAN X G. Application of neural network-based reinforcement learning applied to obstacle avoidance[J]. Journal of Tsinghua University(Natural Sciences Edition), 2008,48(S2):1747-1750.)
[8] WANG Y H,LI T H S,LIN C J. Backward Q-learning:the combination of Sarsa algorithm and Q-learning[J]. Engineering Applications of Artificial Intelligence,2013,26(9):2184-2193.
[9] GOSAVI A. Reinforcement learning:a tutorial survey and recent advances[J]. INFORMS Journal on Computing,2009,21(2):178-192.
[10] WANG S. A method of path planning for mobile robot in dynamic and unknown environment[J]. Electrical Engineering,2010,45(2):36-42.
[11] 王军红, 江虹, 黄玉清, 等. 基于RPkNN-Sarsa (λ)强化学习的机器人路径规划方法[J]. 计算机应用研究,2013,30(1):199-201.(WANG J H,JIANG H,HUANG Y Q,et al. Method of RPkNN-Sarsa(λ)reinforcement learning for robot path planning[J]. Application Research of Computers, 2013, 30(1):199-201.)
[12] ANDRECUT M,ALI M K. Deep-Sarsa:a reinforcement learning algorithm for autonomous navigation[J]. International Journal of Modern Physics C,2001,12(10):1513-1523.
[13] 林联明, 王浩, 王一雄. 基于神经网络的Sarsa强化学习算法[J]. 计算机技术与发展,2006,30(1):30-32.(LIN L M, WANG H,WANG Y X. Sarsa reinforcement learning algorithm based on neural networks[J]. Computer Technology and Development,2006,30(1):30-32.)
[14] VIET H H,AN S H,CHUNG T C. Dyna-Q-based vector direction for path planning problem of autonomous mobile robots in unknown environments[J]. Advanced Robotics,2013,27(3):159-173.
[15] 朱美强. 基于谱图理论的强化学习研究[D]. 徐州:中国矿业大学,2012:55-86.(ZHU M Q. Reinforcement learning based on spectral graph theory[D]. Xuzhou:China University of Mining and Technology,2012:55-86.)
[16] 史豪斌, 徐梦, 刘珈妤, 等. 一种基于Dyna-Q学习的旋翼无人机视觉伺服智能控制方法[J]. 控制与决策,2019,34(12):2517-2526.(SHI H B,XU M,LIU J Y,et a1. A visual servo intelligent control method for rotor UAV based on Dyna-Q learning[J]. Control and Decision,2019,34(12):2517-2526.)
[17] 余伶俐, 魏亚东, 霍淑欣. 基于MCPDDPG的智能车辆路径规划方法及应用[J/OL]. 控制与决策[2019-10-09]. https://kns.cnki.net/kcms/detail/detail.aspx?doi=10.13195/j.kzyjc.2019.0460. (YU L L, WEI Y D,HUO S X. The method and application of intelligent vehicle path planning based on MCPDDPG[J/OL]. Control and Decision[2019-10-09]. https://kns.cnki.net/kcms/detail/detail.aspx?doi=10.13195/j.kzyjc.2019.0460.)
[18] 黄颖, 余玉琴. 一种基于稠密卷积网络和竞争架构的改进路径规划算法[J]. 计算机与数字工程,2019,47(4):812-819. (HUANG Y,YU Y Q. An improved path planning algorithm based on densely connected convolutional network and dueling network architecture[J]. Computer and Digital Engineering, 2019,47(4):812-819.)
[19] 刘涛, 王淑灵, 詹乃军. 多机器人路径规划的安全性验证[J]. 软件学报,2017,28(5):1118-1127.(LIU T,WANG S L, ZHAN N J. Safety verification of trajectory planning for multiple robots[J]. Journal of Software,2017,28(5):1118-1127.)
[20] 曾纪钧, 梁哲恒. 监督式强化学习在路径规划中的应用研究[J]. 计算机应用与软件,2018,35(10):185-188,244.(ZENG J J, LIANG Z H. Research of path planning based on the supervised reinforcement learning[J]. Computer Applications and Software,2018,35(10):185-188,244.)
[21] 解易, 顾益军. 基于Stackelberg策略的多agent强化学习警力巡逻路径规划[J]. 北京理工大学学报,2017,37(1):93-99. (XIE Y,GU Y J. Police patrol path planning using Stackelberg equilibrium based multiagent reinforcement learning[J]. Transactions of Beijing Institute of Technology,2017,37(1):93-99.)
[22] 董培方, 张志安, 梅新虎, 等. 引入势场及陷阱搜索的强化学习路径规划算法[J]. 计算机工程与应用,2018,54(16):129-134.(DONG P F,ZHANG Z A,MEI X H,et al. Reinforcement learning path planning algorithm based on gravitational potential field and trap search[J]. Computer Engineering and Applications, 2018,54(16):129-134.)