Journal of Computer Applications

    Next Articles

A Traffic Signal Optimization Method Based on Dual-Policy Network in Reinforcement Learning

  

  • Received:2025-06-13 Revised:2025-08-29 Accepted:2025-09-09 Online:2025-09-15 Published:2025-09-15

基于双策略网络的交通信号强化学习优化方法

韦敏,黄健   

  1. 西安石油大学
  • 通讯作者: 黄健
  • 基金资助:
    陕西省重点研发计划项目(2023-YBGY-219)

Abstract: Abstract: In recent years, deep reinforcement learning has achieved remarkable progress in the field of intelligent traffic signal control. However, most existing approaches adopt a single-policy network architecture that considers either signal phases or phase durations, resulting in limited flexibility and reduced effectiveness in handling dynamic and complex traffic conditions. To address this issue, this paper proposed a dual-policy network-based traffic signal control model, named CONLight-HPPO. Leveraging the advantages of the Proximal Policy Optimization algorithm, the model employed a consistent and concise design of state and reward representations. It incorporated a hybrid action space that integrates discrete phase selection with continuous green time adjustment, enabling joint optimization of phase and duration. Extensive experiments conducted on the SUMO platform under various typical traffic scenarios demonstrate that, compared with single-policy network models, CONLight-HPPO reduced average travel time, average queue length, and average waiting time by 5.75%, 23.87%, and 23.85%, respectively, in the most complex scenario. Experimental results confirm the model’s adaptability and superiority in handling complex traffic environments.

Key words: Keywords: Traffic Signal Control, Deep Reinforcement Learning, Proximal Policy Optimization, Hybrid Action Space, Urban Traffic Lights

摘要: 摘 要: 近年来,深度强化学习在智能交通信号控制领域取得了显著进展。然而,现有方法多采用单一策略网络结构,仅 考虑信号灯的相位或相位时长,导致控制策略灵活性不足,难以有效应对复杂交通流的动态变化。针对该问题,本文提出一 种基于双策略网络的交通信号控制模型一致性灯-混合近端策略(Consistency Light – Hybrid Proximal Policy Optimization, CONLight-HPPO)。该模型结合近端策略优化算法的优势,以一致且简洁的方式构建状态与奖励函数,融合离散相位与连续 绿灯时长构成的混合动作空间,实现对相位选择与绿灯时间调控的协同优化。在SUMO平台的多种典型交通场景中,与现有 先进方法进行对比实验,结果表明:CONLight-HPPO模型相较于单一策略网络模型在最复杂场景下的平均旅行时间、平均队 列长度和平均等待时间分别降低了5.75%、23.87%和23.85%。实验结果验证了CONLight-HPPO在复杂交通环境下的适应性 与优越性。

关键词: 关键词: 交通信号控制, 深度强化学习, 近端策略算法, 混合动作空间, 城市交通灯

CLC Number: