《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (8): 2361-2368.DOI: 10.11772/j.issn.1001-9081.2021061012

• 人工智能 • 上一篇    

基于强化学习的交通情景问题决策优化

罗飞(), 白梦伟   

  1. 华东理工大学 计算机科学与工程系,上海 200237
  • 收稿日期:2021-06-10 修回日期:2021-10-13 接受日期:2021-10-29 发布日期:2022-01-25 出版日期:2022-08-10
  • 通讯作者: 罗飞
  • 作者简介:罗飞(1978—),男,湖北武汉人,副教授,博士,CCF会员,主要研究方向:认知计算、强化学习;
    白梦伟(1996—),男,河南焦作人,硕士研究生,主要研究方向:强化学习。
  • 基金资助:
    上海市2020年度“科技创新行动计划”项目(20DZ1201400)

Decision optimization of traffic scenario problem based on reinforcement learning

Fei LUO(), Mengwei BAI   

  1. Department of Computer Science and Engineering,East China University of Science and Technology,Shanghai 200237,China
  • Received:2021-06-10 Revised:2021-10-13 Accepted:2021-10-29 Online:2022-01-25 Published:2022-08-10
  • Contact: Fei LUO
  • About author:LUO Fei, born in 1978, Ph. D., associate professor. His research interests include cognitive computing, reinforcement learning.
    BAI Mengwei, born in 1996, M. S. candidate. His research interests include reinforcement learning.
  • Supported by:
    Shanghai 2020 “Science and Technology Innovation Action Plan” Project(20DZ1201400)

摘要:

在复杂交通情景中求解出租车路径规划决策问题和交通信号灯控制问题时,传统强化学习算法在收敛速度和求解精度上存在局限性;因此提出一种改进的强化学习算法求解该类问题。首先,通过优化的贝尔曼公式和快速Q学习(SQL)机制,以及引入经验池技术和直接策略,提出一种改进的强化学习算法GSQL-DSEP;然后,利用GSQL-DSEP算法分别优化出租车路径规划决策问题中的路径长度与交通信号灯控制问题中的车辆总等待时间。相较于Q学习、快速Q学习(SQL)、、广义快速Q学习(GSQL)、Dyna-Q算法,GSQL-DSEP算法在性能测试中降低了至少18.7%的误差,在出租车路径规划决策问题中使决策路径长度至少缩短了17.4%,在交通信号灯控制问题中使车辆总等待时间最多减少了51.5%。实验结果表明,相较于对比算法,GSQL-DSEP算法对解决交通情景问题更具优势。

关键词: 强化学习, 交通情景, 经验池, 马尔可夫决策过程, 决策优化

Abstract:

The traditional reinforcement learning algorithm has limitations in convergence speed and solution accuracy when solving the taxi path planning problem and the traffic signal control problem in traffic scenarios. Therefore, an improved reinforcement learning algorithm was proposed to solve this kind of problems. Firstly, by applying the optimized Bellman equation and Speedy Q-Learning (SQL) mechanism, and introducing experience pool technology and direct strategy, an improved reinforcement learning algorithm, namely Generalized Speedy Q-Learning with Direct Strategy and Experience Pool (GSQL-DSEP), was proposed. Then, GSQL-DSEP algorithm was applied to optimize the path length in the taxi path planning decision problem and the total waiting time of vehicles in the traffic signal control problem. The error of GSQL-DSEP algorithm was reduced at least 18.7% than those of the algorithms such as Q-learning, SQL, Generalized Speedy Q-Learning (GSQL) and Dyna-Q, the decision path length determined by GSQL-DSEP algorithm was reduced at least 17.4% than those determined by the compared algorithms, and the total waiting time of vehicles determined by GSQL-DSEP algorithm was reduced at most 51.5% than those determined by compared algorithms for the traffic signal control problem. Experimental results show that, GSQL-DSEP algorithm has advantages in solving traffic scenario problems over the compared algorithms.

Key words: reinforcement learning, traffic scenario, experience pool, Markov Decision Process (MDP), decision optimization

中图分类号: