Journal of Computer Applications ›› 2021, Vol. 41 ›› Issue (1): 185-190.DOI: 10.11772/j.issn.1001-9081.2020060949

Special Issue: 第八届中国数据挖掘会议(CCDM 2020)

• China Conference on Data Mining 2020 (CCDM 2020) • Previous Articles     Next Articles

Urban transportation path planning based on reinforcement learning

LIU Sijia, TONG Xiangrong   

  1. School of Computer and Control Engineering, Yantai University, Yantai Shandong 264005, China
  • Received:2020-05-31 Revised:2020-07-15 Online:2021-01-10 Published:2020-09-02
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61572418).


刘思嘉, 童向荣   

  1. 烟台大学 计算机与控制工程学院, 山东 烟台 264005
  • 通讯作者: 童向荣
  • 作者简介:刘思嘉(1995-),男,山东德州人,硕士研究生,主要研究方向:强化学习、数据挖掘;童向荣(1975-),男,山东招远人,教授,博士,CCF会员,主要研究方向:多agent系统、分布式人工智能、数据挖掘。
  • 基金资助:

Abstract: For urban transportation path planning issue, the speed of planning and the safety of vehicles in the path needed to be considered, but most existing reinforcement learning algorithms cannot consider both of them. Aiming at this problem, the following steps were carried out. First, a Dyna framework with the combination of model-based and model-independent algorithms was proposed, so as to improve the speed of planning. Then, the classical Sarsa algorithm was used as a route selection strategy in order to improve the safety of the algorithm. Finally, the above two were combined and an improved Sarsa-based algorithm called Dyna-Sa was proposed. Experimental results show that the reinforcement learning algorithm converges faster with more planning steps in advance. Compared with Q-learning, Sarsa and Dyna-Q algorithms through metrics such as convergence speed and number of collisions, it can be seen that the Dyna-Sa algorithm not only reduces the number of collisions in the map with obstacles, ensures the safety of vehicles in the urban traffic environment, but also accelerates the algorithm convergence.

Key words: path planning, urban transportation, reinforcement learning, Dyna framework, Sarsa algorithm

摘要: 城市交通路径规划需要考虑规划的快速性和车辆的安全性,而目前大多数强化学习算法不能兼顾两者。针对这个问题,首先提出采用基于模型的算法和与模型无关的算法相结合的Dyna框架,以提高规划的速度;然后使用经典的Sarsa算法作为选路策略,以提高算法的安全性;最后将两者结合提出了改进的基于Sarsa的Dyna-Sa算法。实验结果表明,提前规划步数越多的强化学习算法收敛速度越快。使用收敛速度和碰撞次数等指标,将Dyna-Sa算法与Q-学习算法、Sarsa算法和Dyna-Q算法进行对比,可知Dyna-Sa算法能够减少车辆在有障碍地图中的碰撞次数,保证车辆在城市交通环境中的安全性,同时能够提高算法收敛速度。

关键词: 路径规划, 城市交通, 强化学习, Dyna框架, Sarsa算法

CLC Number: