Traffic signal control algorithm based on overall state prediction and fair experience replay

doi:10.11772/j.issn.1001-9081.2024010066

Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (1): 337-344.DOI: 10.11772/j.issn.1001-9081.2024010066

• Frontier and comprehensive applications • Previous Articles

Traffic signal control algorithm based on overall state prediction and fair experience replay

Zijun MIAO, Fei LUO(), Weichao DING, Wenbo DONG

School of Information Science and Engineering，East China University of Science and Technology，Shanghai 200237，China

Received:2024-01-19 Revised:2024-03-15 Accepted:2024-03-25 Online:2024-05-09 Published:2025-01-10
Contact: Fei LUO
About author:MIAO Zijun， born in 1999， M. S. candidate. His research interests include reinforcement learning.
DING Weichao， born in 1989， Ph. D.， associate professor. His research interests include cloud computing， swarm intelligence computing， federated learning.
DONG Wenbo， born in 1992， Ph. D. candidate， lecturer. His research interests include machine learning， artificial intelligence.
Supported by:
Surface Program of National Natural Science Foundation of China(62276097);Natural Science Foundation of Shanghai(22ZR1416500);Shanghai Pilot Program for Basic Research(22TQ1400100-16)

基于全局状态预测与公平经验重放的交通信号控制算法

缪孜珺, 罗飞(), 丁炜超, 董文波

华东理工大学信息科学与工程学院，上海 200237

通讯作者: 罗飞
作者简介:缪孜珺（1999—），男，浙江宁波人，硕士研究生，主要研究方向：强化学习；
丁炜超（1989—），男，山东青岛人，副教授，博士，CCF会员，主要研究方向：云计算、群智计算、联邦学习；
董文波（1992—），男，河南新乡人，讲师，博士研究生，CCF会员，主要研究方向：机器学习、人工智能。
基金资助:
国家自然科学基金面上项目(62276097);上海市自然科学基金资助项目(22ZR1416500);上海市基础研究特区计划项目(22TQ1400100-16)

Abstract

Abstract:

In order to cope with traffic congestion， efficient traffic signal control algorithms have been designed， which can improve the traffic efficiency of vehicles in the existing transportation network significantly. Although deep reinforcement learning algorithms have shown excellent performance in single intersection traffic signal control problems， their application in multi-intersection environments still faces major challenge — the non-stationarity problem caused by the spatiotemporal partial observability generated by Multi-Agent Reinforcement Learning （MARL） algorithm， resulting in that the deep reinforcement learning algorithms cannot guarantee stable convergence. To this end， a multi-intersection traffic signal control algorithm based on overall state prediction and fair experience replay — IS-DQN was proposed. For one thing， to avoid the problem of non-stationarity caused by spatial observability in algorithm， the state space of IS-DQN was expanded by predicting the overall state of multiple intersections based on historical traffic flow information from different lanes. For another， in order to cope with the time partial observability brought by traditional experience replay strategies， a reservoir sampling algorithm was adopted to ensure the fairness of experience replay pool， so as to avoid non-stationary problems in it. Experimental results on three different traffic pressure simulations in complex multi-intersection environments show that under different traffic pressure conditions， especially in low and medium traffic flow conditions， IS-DQN algorithm has lower average vehicle driving time， better convergence performance and convergence stability compared to independent deep reinforcement learning algorithms.

Key words: deep reinforcement learning, traffic signal control, time series prediction, reservoir sampling algorithm, Long Short-Term Memory (LSTM)

摘要：

为了应对交通拥堵而设计的高效交通信号控制算法能提升现有交通网络下的车辆通行效率。尽管深度强化学习算法在单路口交通信号控制问题上已展现出卓越的性能，然而这些算法在多路口环境下的应用仍然面临着重大的挑战——多智能体强化学习（MARL）算法产生的时间和空间的部分可观测性引发的非平稳性问题会导致这些算法无法稳定的收敛。因此，提出一种基于全局状态预测与公平经验重放的多路口交通信号控制算法IS-DQN。一方面，基于不同车道的车流历史信息预测多路口的全局状态，从而扩展IS-DQN的状态空间，以避免算法产生空间部分可观测性而带来非平稳性问题；另一方面，为应对传统经验重放策略的时间部分可观测性，采用蓄水池抽样算法来保证经验重放池的公正性，进而避免其中的非平稳性问题。在复杂的多路口环境下应用IS-DQN算法进行3种不同的交通压力仿真实验的结果表明：在不同交通流情况下，尤其是在中低交通流量下，相较于独立的深度强化学习算法，IS-DQN算法能得到更短的车辆平均行驶时间，并表现出了更优的收敛性能与收敛稳定性。

关键词: 深度强化学习, 交通信号控制, 时序预测, 蓄水池抽样算法, 长短期记忆

CLC Number:

TP181

Zijun MIAO, Fei LUO, Weichao DING, Wenbo DONG. Traffic signal control algorithm based on overall state prediction and fair experience replay[J]. Journal of Computer Applications, 2025, 45(1): 337-344.

缪孜珺, 罗飞, 丁炜超, 董文波. 基于全局状态预测与公平经验重放的交通信号控制算法[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 337-344.

Figures/Tables 10

Fig. 1 MDP process of traffic signal control problem

Fig. 2 Structure of prediction network in IS-DQN algorithm

Fig. 3 Overall interaction flow of IS-DQN algorithm

Fig. 4 Simulation diagram of multi-intersection test environment

Tab. 1 Hyperparameter setting of classical algorithms

算法	超参数	值	含义
FixTime	$C$	80	相位周期
MaxPressure	$φ m i n$	5	最小绿灯时间
SOTL	$φ m i n$	2	最小绿灯时间
	$θ$	4	车辆数阈值
	$μ$	28	绿灯车辆数阈值

Tab. 1 Hyperparameter setting of classical algorithms

算法	超参数	值	含义
FixTime	$C$	80	相位周期
MaxPressure	$φ m i n$	5	最小绿灯时间
SOTL	$φ m i n$	2	最小绿灯时间
	$θ$	4	车辆数阈值
	$μ$	28	绿灯车辆数阈值

Tab. 2 Average driving time optimized by different algorithms under different traffic pressure

交通压力	路口编号	IS-DQN	DQN	DQN-PS	DQN-ER	FixTime	MaxPressure	SOTL
低流量	路口1	92.520	103.638	93.750	98.904	285.387	143.212	148.432
	路口2	85.072	93.101	85.771	89.633	279.508	126.871	128.802
	路口3	73.975	83.960	74.444	82.593	272.065	109.183	109.260
	路口4	94.085	99.426	94.440	98.913	400.112	152.070	162.543
	路口5	95.291	99.374	94.877	96.614	349.340	140.418	149.679
	路口6	74.085	81.204	74.266	80.416	345.519	118.796	146.500
中流量	路口1	112.058	121.691	114.467	117.215	244.768	156.211	163.154
	路口2	106.165	116.094	108.171	109.196	216.307	141.574	153.677
	路口3	85.988	94.483	88.368	96.213	225.756	117.185	165.453
	路口4	124.960	141.970	126.198	135.349	289.689	171.429	204.084
	路口5	108.599	113.521	110.960	115.294	286.784	143.639	178.838
	路口6	91.078	107.781	92.318	100.988	258.487	135.075	206.624
高流量	路口1	129.778	128.808	127.764	131.759	276.707	166.963	146.500
	路口2	125.551	123.464	124.294	131.589	208.357	149.933	162.850
	路口3	107.029	109.390	114.653	114.301	201.837	124.199	173.341
	路口4	164.431	161.762	161.873	166.006	336.926	188.821	200.651
	路口5	118.135	117.691	121.091	121.621	388.053	148.797	163.518
	路口6	130.814	141.184	130.136	133.617	242.165	145.862	197.414

Fig. 5 Convergence process of each algorithm under low traffic flow

Fig. 6 Convergence process of each algorithm under medium traffic flow

Fig. 7 Convergence process of each algorithm under high traffic flow

Fig. 8 Cumulative rewards of DQN and IS-DQN algorithms under medium traffic flow

References 26

1	KŐVÁRI B， KOLAT M， BÉCSI T， et al. Competitive multi-agent reinforcement learning for traffic signal control ［C］// Proceedings of the IEEE 20th Jubilee International Symposium on Intelligent Systems and Informatics. Piscataway： IEEE， 2022： 361-366.
2	NOAEEM M， NAIK A， GOODMAN L， et al. Reinforcement learning in urban network traffic signal control： a systematic literature review ［J］. Expert Systems with Applications， 2022， 199： No.116830.
3	MA D， XIAO J， SONG X， et al. A back-pressure-based model with fixed phase sequences for traffic signal optimization under oversaturated networks ［J］. IEEE Transactions on Intelligent Transportation Systems， 2021， 22（9）： 5577-5588.
4	DUCROCQ R， FARHI N. Deep reinforcement Q-learning for intelligent traffic signal control with partial detection ［J］. International Journal of Intelligent Transportation Systems Research， 2023， 21（1）： 192-206.
5	HAN G， LIU X， WANG H， et al. An attention reinforcement learning-based strategy for large-scale adaptive traffic signal control system ［J］. Journal of Transportation Engineering， Part A： Systems， 2024， 150（3）： No.04024001.
6	YAZDANI M， SARVI M， BAGLOEE S A， et al. Intelligent Vehicle Pedestrian Light （IVPL）： a deep reinforcement learning approach for traffic signal control ［J］. Transportation Research Part C： Emerging Technologies， 2023， 149： No.103991.
7	ZHU R， LI L， WU S， et al. Multi-agent broad reinforcement learning for intelligent traffic light control ［J］. Information Sciences， 2023， 619： 509-525.
8	KOLAT M， KŐVÁRI B， BÉCSI T， et al. Multi-agent reinforcement learning for traffic signal control： a cooperative approach ［J］. Sustainability， 2023， 15（4）： No.3479.
9	ZHANG K， YANG Z， BAŞAR T. Multi-agent reinforcement learning： a selective overview of theories and algorithms ［M］// VAMVOUDAKIS K G， WAN Y， LEWIS F L， et al. Handbook of reinforcement learning and control， SSDC 325. Cham： Springer， 2021： 321-384.
10	YANG S. Hierarchical graph multi-agent reinforcement learning for traffic signal control ［J］. Information Sciences， 2023， 634： 55-72.
11	YANG S， YANG B. An inductive heterogeneous graph attention-based multi-agent deep graph infomax algorithm for adaptive traffic signal control ［J］. Information Fusion， 2022， 88： 249-262.
12	ZHAO Z， WANG K， WANG Y， et al. Enhancing traffic signal control with composite deep intelligence ［J］. Expert Systems with Applications， 2024， 244： No.123020.
13	GUO J， CHENG L， WANG S. CoTV： cooperative control for traffic light signals and connected autonomous vehicles using deep reinforcement learning ［J］. IEEE Transactions on Intelligent Transportation Systems， 2023， 24（10）： 10501-10512.
14	REN F， DONG W， ZHAO X， et al. Two-layer coordinated reinforcement learning for traffic signal control in traffic network ［J］. Expert Systems with Applications， 2024， 235： No.121111.
15	BOKADE R， JIN X， AMATO C. Multi-agent reinforcement learning based on representational communication for large-scale traffic signal control ［J］. IEEE Access， 2023， 11： 47646-47658.
16	FANG J， YOU Y， XU M， et al. Multi-objective traffic signal control using network-wide agent coordinated reinforcement learning ［J］. Expert Systems with Applications， 2023， 229（Pt A）： No.120535.
17	STONE P， KAMINKA G A， KRAUS S， et al. Ad hoc autonomous agent teams： collaboration without pre-coordination ［C］// Proceedings of the 24th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2010： 1504-1509.
18	GMYTRASIEWICZ P J， DOSHI P. A framework for sequential planning in multi-agent settings ［J］. Journal of Artificial Intelligence Research， 2005， 24： 49-79.
19	HERNANDEZ-LEAL P， KAISERS M， BAARSLAG T， et al. A survey of learning in multiagent environments： dealing with non-stationarity ［EB/OL］. ［2023-08-10］. .
20	SCHAUL T， QUAN J， ANTONOGLOU I， et al. Prioritized experience replay ［EB/OL］. ［2023-08-03］. .
21	TESAURO G. Extending Q-learning to general adaptive multi-agent systems ［C］// Proceedings of the 16th International Conference on Neural Information Processing Systems. Cambridge： MIT Press， 2003： 871-878.
22	ABBASIMEHR H， PAKI R. Improving time series forecasting using LSTM and attention models ［J］. Journal of Ambient Intelligence and Humanized Computing， 2022， 13（1）： 673-691.
23	TANG Z， NAPHADE M， LIU M Y， et al. CityFlow： a city-scale benchmark for multi-target multi-camera vehicle tracking and re-identification ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 8789-8798.
24	ZHENG G， XIONG Y， ZANG X， et al. Learning phase competition for traffic signal control ［C］// Proceedings of the 28th ACM International Conference on Information and Knowledge Management. New York： ACM， 2019： 1963-1972.
25	COOLS S B， GERSHENSON C， D’HOOGHE B. Self-organizing traffic lights： a realistic simulation ［M］// PROKOPENKO M. Advances in applied self-organizing systems， AI&KP. London： Springer， 2013： 45-55.
26	LEVIN M W. Max-Pressure traffic signal timing： a summary of methodological and experimental results ［J］. Journal of Transportation Engineering， Part A： Systems， 2023， 149（4）： No.7578.

[1]	Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738.
[2]	Yi ZHOU, Hua GAO, Yongshen TIAN. Proximal policy optimization algorithm based on clipping optimization and policy guidance [J]. Journal of Computer Applications, 2024, 44(8): 2334-2341.
[3]	Tian MA, Runtao XI, Jiahao LYU, Yijie ZENG, Jiayi YANG, Jiehui ZHANG. Mobile robot 3D space path planning method based on deep reinforcement learning [J]. Journal of Computer Applications, 2024, 44(7): 2055-2064.
[4]	Runze TIAN, Yulong ZHOU, Hong ZHU, Gang XUE. Local information based path selection algorithm for service migration [J]. Journal of Computer Applications, 2024, 44(7): 2168-2174.
[5]	Zexin XU, Lei YANG, Kangshun LI. Shorter long-sequence time series forecasting model [J]. Journal of Computer Applications, 2024, 44(6): 1824-1831.
[6]	Xiaoyan ZHAO, Wei HAN, Junna ZHANG, Peiyan YUAN. Collaborative offloading strategy in internet of vehicles based on asynchronous deep reinforcement learning [J]. Journal of Computer Applications, 2024, 44(5): 1501-1510.
[7]	Rui TANG, Chuanlin PANG, Ruizhi ZHANG, Chuan LIU, Shibo YUE. DDPG-based resource allocation in D2D communication-empowered cellular network [J]. Journal of Computer Applications, 2024, 44(5): 1562-1569.
[8]	Xintong QIN, Zhengyu SONG, Tianwei HOU, Feiyue WANG, Xin SUN, Wei LI. Channel access and resource allocation algorithm for adaptive p-persistent mobile ad hoc network [J]. Journal of Computer Applications, 2024, 44(3): 863-868.
[9]	Fuqin DENG, Huifeng GUAN, Chaoen TAN, Lanhui FU, Hongmin WANG, Tinlun LAM, Jianmin ZHANG. Multi-robot reinforcement learning path planning method based on request-response communication mechanism and local attention mechanism [J]. Journal of Computer Applications, 2024, 44(2): 432-438.
[10]	Yuanchao LI, Chongben TAO, Chen WANG. Gait control method based on maximum entropy deep reinforcement learning for biped robot [J]. Journal of Computer Applications, 2024, 44(2): 445-451.
[11]	Jiachen YU, Ye YANG. Irregular object grasping by soft robotic arm based on clipped proximal policy optimization algorithm [J]. Journal of Computer Applications, 2024, 44(11): 3629-3638.
[12]	Yu ZENG, Yang ZHANG, Shang ZENG, Maoli FU, Qixue HE, Linlong ZENG. Time series prediction algorithm based on multi-scale gated dilated convolutional network [J]. Journal of Computer Applications, 2024, 44(11): 3427-3434.
[13]	Xiaoyu HUA, Dongfen LI, You FU, Kejun BI, Shi YING, Ruijin WANG. Industrial chain risk assessment and early warning model combining hierarchical graph neural network and long short-term memory [J]. Journal of Computer Applications, 2024, 44(10): 3223-3231.
[14]	Hanxiao SHI, Leichun WANG. Short-term power load forecasting by graph convolutional network combining LSTM and self-attention mechanism [J]. Journal of Computer Applications, 2024, 44(1): 311-317.
[15]	Zhiping ZHU, Yan YANG, Jie WANG. Scene graph-aware cross-modal image captioning model [J]. Journal of Computer Applications, 2024, 44(1): 58-64.

Traffic signal control algorithm based on overall state prediction and fair experience replay

基于全局状态预测与公平经验重放的交通信号控制算法

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 10

References 26

Related Articles 15

Recommended Articles

Metrics