《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (1): 337-344.DOI: 10.11772/j.issn.1001-9081.2024010066

• 前沿与综合应用 • 上一篇    

基于全局状态预测与公平经验重放的交通信号控制算法

缪孜珺, 罗飞(), 丁炜超, 董文波   

  1. 华东理工大学 信息科学与工程学院,上海 200237
  • 收稿日期:2024-01-19 修回日期:2024-03-15 接受日期:2024-03-25 发布日期:2024-05-09 出版日期:2025-01-10
  • 通讯作者: 罗飞
  • 作者简介:缪孜珺(1999—),男,浙江宁波人,硕士研究生,主要研究方向:强化学习;
    丁炜超(1989—),男,山东青岛人,副教授,博士,CCF会员,主要研究方向:云计算、群智计算、联邦学习;
    董文波(1992—),男,河南新乡人,讲师,博士研究生,CCF会员,主要研究方向:机器学习、人工智能。
  • 基金资助:
    国家自然科学基金面上项目(62276097);上海市自然科学基金资助项目(22ZR1416500);上海市基础研究特区计划项目(22TQ1400100-16)

Traffic signal control algorithm based on overall state prediction and fair experience replay

Zijun MIAO, Fei LUO(), Weichao DING, Wenbo DONG   

  1. School of Information Science and Engineering,East China University of Science and Technology,Shanghai 200237,China
  • Received:2024-01-19 Revised:2024-03-15 Accepted:2024-03-25 Online:2024-05-09 Published:2025-01-10
  • Contact: Fei LUO
  • About author:MIAO Zijun, born in 1999, M. S. candidate. His research interests include reinforcement learning.
    DING Weichao, born in 1989, Ph. D., associate professor. His research interests include cloud computing, swarm intelligence computing, federated learning.
    DONG Wenbo, born in 1992, Ph. D. candidate, lecturer. His research interests include machine learning, artificial intelligence.
  • Supported by:
    Surface Program of National Natural Science Foundation of China(62276097);Natural Science Foundation of Shanghai(22ZR1416500);Shanghai Pilot Program for Basic Research(22TQ1400100-16)

摘要:

为了应对交通拥堵而设计的高效交通信号控制算法能提升现有交通网络下的车辆通行效率。尽管深度强化学习算法在单路口交通信号控制问题上已展现出卓越的性能,然而这些算法在多路口环境下的应用仍然面临着重大的挑战——多智能体强化学习(MARL)算法产生的时间和空间的部分可观测性引发的非平稳性问题会导致这些算法无法稳定的收敛。因此,提出一种基于全局状态预测与公平经验重放的多路口交通信号控制算法IS-DQN。一方面,基于不同车道的车流历史信息预测多路口的全局状态,从而扩展IS-DQN的状态空间,以避免算法产生空间部分可观测性而带来非平稳性问题;另一方面,为应对传统经验重放策略的时间部分可观测性,采用蓄水池抽样算法来保证经验重放池的公正性,进而避免其中的非平稳性问题。在复杂的多路口环境下应用IS-DQN算法进行3种不同的交通压力仿真实验的结果表明:在不同交通流情况下,尤其是在中低交通流量下,相较于独立的深度强化学习算法,IS-DQN算法能得到更短的车辆平均行驶时间,并表现出了更优的收敛性能与收敛稳定性。

关键词: 深度强化学习, 交通信号控制, 时序预测, 蓄水池抽样算法, 长短期记忆

Abstract:

In order to cope with traffic congestion, efficient traffic signal control algorithms have been designed, which can improve the traffic efficiency of vehicles in the existing transportation network significantly. Although deep reinforcement learning algorithms have shown excellent performance in single intersection traffic signal control problems, their application in multi-intersection environments still faces major challenge — the non-stationarity problem caused by the spatiotemporal partial observability generated by Multi-Agent Reinforcement Learning (MARL) algorithm, resulting in that the deep reinforcement learning algorithms cannot guarantee stable convergence. To this end, a multi-intersection traffic signal control algorithm based on overall state prediction and fair experience replay — IS-DQN was proposed. For one thing, to avoid the problem of non-stationarity caused by spatial observability in algorithm, the state space of IS-DQN was expanded by predicting the overall state of multiple intersections based on historical traffic flow information from different lanes. For another, in order to cope with the time partial observability brought by traditional experience replay strategies, a reservoir sampling algorithm was adopted to ensure the fairness of experience replay pool, so as to avoid non-stationary problems in it. Experimental results on three different traffic pressure simulations in complex multi-intersection environments show that under different traffic pressure conditions, especially in low and medium traffic flow conditions, IS-DQN algorithm has lower average vehicle driving time, better convergence performance and convergence stability compared to independent deep reinforcement learning algorithms.

Key words: deep reinforcement learning, traffic signal control, time series prediction, reservoir sampling algorithm, Long Short-Term Memory (LSTM)

中图分类号: