A Traffic Signal Optimization Method Based on Dual-Policy Network in Reinforcement Learning

doi:10.11772/j.issn.1001-9081.2025050651

Journal of Computer Applications

Received:2025-06-13 Revised:2025-08-29 Accepted:2025-09-09 Online:2025-09-15 Published:2025-09-15

基于双策略网络的交通信号强化学习优化方法

韦敏,黄健

西安石油大学

通讯作者: 黄健
基金资助:
陕西省重点研发计划项目（2023-YBGY-219）

Abstract

Abstract: Abstract: In recent years, deep reinforcement learning has achieved remarkable progress in the field of intelligent traffic signal control. However, most existing approaches adopt a single-policy network architecture that considers either signal phases or phase durations, resulting in limited flexibility and reduced effectiveness in handling dynamic and complex traffic conditions. To address this issue, this paper proposed a dual-policy network-based traffic signal control model, named CONLight-HPPO. Leveraging the advantages of the Proximal Policy Optimization algorithm, the model employed a consistent and concise design of state and reward representations. It incorporated a hybrid action space that integrates discrete phase selection with continuous green time adjustment, enabling joint optimization of phase and duration. Extensive experiments conducted on the SUMO platform under various typical traffic scenarios demonstrate that, compared with single-policy network models, CONLight-HPPO reduced average travel time, average queue length, and average waiting time by 5.75%, 23.87%, and 23.85%, respectively, in the most complex scenario. Experimental results confirm the model’s adaptability and superiority in handling complex traffic environments.

Key words: Keywords: Traffic Signal Control, Deep Reinforcement Learning, Proximal Policy Optimization, Hybrid Action Space, Urban Traffic Lights

摘要： 摘要: 近年来，深度强化学习在智能交通信号控制领域取得了显著进展。然而，现有方法多采用单一策略网络结构，仅考虑信号灯的相位或相位时长，导致控制策略灵活性不足，难以有效应对复杂交通流的动态变化。针对该问题，本文提出一种基于双策略网络的交通信号控制模型一致性灯-混合近端策略（Consistency Light – Hybrid Proximal Policy Optimization, CONLight-HPPO）。该模型结合近端策略优化算法的优势，以一致且简洁的方式构建状态与奖励函数，融合离散相位与连续绿灯时长构成的混合动作空间，实现对相位选择与绿灯时间调控的协同优化。在SUMO平台的多种典型交通场景中，与现有先进方法进行对比实验，结果表明：CONLight-HPPO模型相较于单一策略网络模型在最复杂场景下的平均旅行时间、平均队列长度和平均等待时间分别降低了5.75%、23.87%和23.85%。实验结果验证了CONLight-HPPO在复杂交通环境下的适应性与优越性。

关键词: 关键词: 交通信号控制, 深度强化学习, 近端策略算法, 混合动作空间, 城市交通灯

CLC Number:

TP391.9

韦敏黄健. 基于双策略网络的交通信号强化学习优化方法[J]. 《计算机应用》唯一官方网站, DOI: 10.11772/j.issn.1001-9081.2025050651.

[1]	Tianyu XUE, Aiping LI, Liguo DUAN. Vehicular edge computing scheme with task offloading and resource optimization [J]. Journal of Computer Applications, 2025, 45(6): 1766-1775.
[2]	Pengcheng XU, Lei HE, Chuan LI, Weiqi QIAN, Tun ZHAO. Deep symbolic regression method based on Transformer [J]. Journal of Computer Applications, 2025, 45(5): 1455-1463.
[3]	Jing WANG, Xuming FANG. Intelligent joint power and channel allocation algorithm for Wi-Fi7 multi-link integrated communication and sensing [J]. Journal of Computer Applications, 2025, 45(2): 563-570.
[4]	Huahua WANG, Liang HUANG, Jiajie CHEN, Jiening FANG. Dynamic allocation algorithm for multi-beam subcarriers of low orbit satellites based on deep reinforcement learning [J]. Journal of Computer Applications, 2025, 45(2): 571-577.
[5]	Jun ZENG, Yinghua TONG, Defang WANG. Anomaly detection method based on cumulative probability fluctuation and automated clustering [J]. Journal of Computer Applications, 2025, 45(12): 3864-3871.
[6]	Lin WEI, Shihao ZHANG, Mengyang HE. Workflow task optimization and energy-efficient offloading method for computing power network [J]. Journal of Computer Applications, 2025, 45(12): 3916-3924.
[7]	Lin WEI, Jinyang LI, Yajie WANG, Mengyang HE. Highly reliable matching method based on multi-dimensional resource measurement and rescheduling in computing power network [J]. Journal of Computer Applications, 2025, 45(11): 3632-3641.
[8]	Shuai ZHOU, Hao FU, Wei LIU. Spatial-temporal Transformer-based hybrid return implicit Q-learning for crowd navigation [J]. Journal of Computer Applications, 2025, 45(11): 3666-3673.
[9]	Yanpeng ZHANG, Yuqian ZHAO, Fan ZHANG, Tenghai QIU, Gui GUI, Lingli YU. Capacitated vehicle routing problem solving method based on improved MAML and GVAE [J]. Journal of Computer Applications, 2025, 45(11): 3642-3648.
[10]	Zijun MIAO, Fei LUO, Weichao DING, Wenbo DONG. Traffic signal control algorithm based on overall state prediction and fair experience replay [J]. Journal of Computer Applications, 2025, 45(1): 337-344.
[11]	Yi ZHOU, Hua GAO, Yongshen TIAN. Proximal policy optimization algorithm based on clipping optimization and policy guidance [J]. Journal of Computer Applications, 2024, 44(8): 2334-2341.
[12]	Tian MA, Runtao XI, Jiahao LYU, Yijie ZENG, Jiayi YANG, Jiehui ZHANG. Mobile robot 3D space path planning method based on deep reinforcement learning [J]. Journal of Computer Applications, 2024, 44(7): 2055-2064.
[13]	Xiaoyan ZHAO, Wei HAN, Junna ZHANG, Peiyan YUAN. Collaborative offloading strategy in internet of vehicles based on asynchronous deep reinforcement learning [J]. Journal of Computer Applications, 2024, 44(5): 1501-1510.
[14]	Rui TANG, Chuanlin PANG, Ruizhi ZHANG, Chuan LIU, Shibo YUE. DDPG-based resource allocation in D2D communication-empowered cellular network [J]. Journal of Computer Applications, 2024, 44(5): 1562-1569.
[15]	Xintong QIN, Zhengyu SONG, Tianwei HOU, Feiyue WANG, Xin SUN, Wei LI. Channel access and resource allocation algorithm for adaptive p-persistent mobile ad hoc network [J]. Journal of Computer Applications, 2024, 44(3): 863-868.