计算机应用 ›› 2019, Vol. 39 ›› Issue (5): 1495-1499.DOI: 10.11772/j.issn.1001-9081.2018092015

• 应用前沿、交叉与综合 • 上一篇    下一篇

基于深度强化学习的城市交通信号控制算法

舒凌洲, 吴佳, 王晨   

  1. 电子科技大学 信息与软件工程学院, 成都 610054
  • 收稿日期:2018-10-08 修回日期:2019-01-02 发布日期:2019-05-14 出版日期:2019-05-10
  • 通讯作者: 吴佳
  • 作者简介:舒凌洲(1995-),男,四川彭州人,硕士研究生,主要研究方向:深度强化学习、智能交通系统;吴佳(1980-),女,四川成都人,副教授,博士,主要研究方向:深度强化学习、智能交通系统;王晨(1995-),女,陕西西安人,硕士研究生,主要研究方向:深度强化学习。
  • 基金资助:
    国家自然科学基金资助项目(61503059)。

Urban traffic signal control based on deep reinforcement learning

SHU Lingzhou, WU Jia, WANG Chen   

  1. School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu Sichuan 610054, China
  • Received:2018-10-08 Revised:2019-01-02 Online:2019-05-14 Published:2019-05-10
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61503059).

摘要: 针对城市交通信号控制中如何有效利用相关信息优化交通控制并保证控制算法的适应性和鲁棒性的问题,提出一种基于深度强化学习的交通信号控制算法,利用深度学习网络构造一个智能体来控制整个区域交通。首先通过连续感知交通环境的状态来选择当前状态下可能的最优控制策略,环境的状态由位置矩阵和速度矩阵抽象表示,矩阵表示法有效地抽象出环境中的主要信息并减少了冗余信息;然后智能体以在有限时间内最大化车辆通行全局速度为目标,根据所选策略对交通环境的影响,利用强化学习算法不断修正其内部参数;最后,通过多次迭代,智能体学会如何有效地控制交通。在微观交通仿真软件Vissim中进行的实验表明,对比其他基于深度强化学习的算法,所提算法在全局平均速度、平均等待队长以及算法稳定性方面展现出更好的结果。其中,与基线相比,平均速度提高9%,平均等待队长降低约13.4%。实验结果证明该方法能够适应动态变化的复杂的交通环境。

关键词: 深度学习, 卷积神经网络, 强化学习, 交通信号控制

Abstract: To meet the requirements for adaptivity, and robustness of the algorithm to optimize urban traffic signal control, a traffic signal control algorithm based on Deep Reinforcement Learning (DRL) was proposed to control the whole regional traffic with a control Agent contructed by a deep learning network. Firstly, the Agent predicted the best possible traffic control strategy for the current state by observing continously the state of the traffic environment with an abstract representation of a location matrix and a speed matrix, because the matrix representation method can effectively abstract vital information and reduce redundant information about the traffic environment. Then, based on the impact of the strategy selected on the traffic environment, a reinforcement learning algorithm was employed to correct the intrinsic parameters of the Agent constantly in order to maximize the global speed in a period of time. Finally, after several iterations, the Agent learned how to effectively control the traffic.The experiments in the traffic simulation software Vissim show that compared with other algorithms based on DRL, the proposed algorithm is superior in average global speed, average queue length and stability; the average global speed increases 9% and the average queue length decreases 13.4% compared to the baseline. The experimental results verify that the proposed algorithm can adapt to complex and dynamically changing traffic environment.

Key words: deep learning, Convolutional Neural Network (CNN), reinforcement learning, traffic signal control

中图分类号: