Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (9): 2958-2963.DOI: 10.11772/j.issn.1001-9081.2023091266

• Frontier and comprehensive applications • Previous Articles     Next Articles

Safe reinforcement learning method for decision making of autonomous lane changing based on trajectory prediction

Hailin XIAO1(), Tianyi HUANG1, Qiuxiang DAI1, Yuejun ZHANG2, Zhongshan ZHANG3   

  1. 1.School of Computer Science and Information Engineering,Hubei University,Wuhan Hubei 430062,China
    2.Faculty of Electrical Engineering and Computer Science,Ningbo University,Ningbo Zhejiang 315000,China
    3.School of Information and Electronics,Beijing Institute of Technology,Beijing 100081,China
  • Received:2023-09-18 Revised:2023-11-30 Accepted:2023-12-04 Online:2023-12-21 Published:2024-09-10
  • Contact: Hailin XIAO
  • About author:HUANG Tianyi, born in 1998, M. S. candidate. His research interests include in-vehicle communications, intelligent vehicle control.
    DAI Qiuxiang, born in 1996, M. S. candidate. Her research interests include wireless communications, intelligent reflecting surface.
    ZHANG Yuejun, born in 1982, Ph. D., professor. His research interests include information security chips, low-power integrated circuit design.
    ZHANG Zhongshan, born in 1974, Ph. D., professor. His research interests include 5G/B5G communications.
  • Supported by:
    National Natural Science Foundation of China(61872406);Guangxi Key Research and Development Program(GUIKE AB23026034);Outstanding Young and Middle-Aged Science and Technology Innovation Team Program for Universities of Hubei Province in 2021(T2021001)

基于轨迹预测的安全强化学习自动变道决策方法

肖海林1(), 黄天义1, 代秋香1, 张跃军2, 张中山3   

  1. 1.湖北大学 计算机与信息工程学院, 武汉 430062
    2.宁波大学 信息工程与科学学院, 浙江 宁波 315000
    3.北京理工大学 信息与电子学院, 北京 100081
  • 通讯作者: 肖海林
  • 作者简介:黄天义(1998—),男,陕西西安人,硕士研究生,主要研究方向:车载通信、智能车辆控制;
    代秋香(1996—),女,湖北随州人,硕士研究生,主要研究方向:无线通信、智能反射面;
    张跃军(1982—),男,浙江台州人,教授,博士,主要研究方向:信息安全芯片、低功耗集成电路设计;
    张中山(1974—),男,河北遵化人,教授,博士,主要研究方向:5G/B5G通信。
  • 基金资助:
    国家自然科学基金资助项目(61872406);广西重点研发计划项目(桂科AB23026034);2021年度湖北省高校优秀中青年科技创新团队项目(T2021001)

Abstract:

Deep reinforcement learning easily leads to unsafe actions in the training process due to its trial-and-error learning characteristics in decision-making problem of autonomous lane changing. Therefore, a safe reinforcement learning method for decision making of autonomous lane changing based on trajectory prediction was proposed. Firstly, the future trajectories of the vehicles were predicted through probabilistic modeling of maximum likelihood estimation. Secondly, driving risk assessment was performed by using the obtained trajectory prediction and safety distance. And the safe actions were constrained according to the driving risk assessment results, which means that the action space was cut into the safe action space and the intelligent vehicle was guided to avoid dangerous actions. The proposed method was tested and compared with Deep Q-Network (DQN) and its improved methods in the freeway scene of simulation platform. Experimental results show that the proposed method can reduce the number of collisions by 47%-57% compared to other methods while ensuring fast convergence during intelligent vehicle training process, and thus improves the safety during training process effectively.

Key words: safe reinforcement learning, decision making of autonomous lane changing, trajectory prediction, risk assessment, action space cutting

摘要:

深度强化学习在自动变道决策问题中由于它的试错学习的特性,易在训练过程中导致不安全的行为。为此,提出一种基于轨迹预测的安全强化学习自动变道决策方法。首先,通过最大似然估计的概率建模并预测车辆的未来行驶轨迹;其次,利用得到的预测轨迹和安全距离指标进行驾驶风险评估,并且根据驾驶风险评估结果进行安全动作约束,将动作空间裁剪为安全动作空间,指导智能车辆避免危险动作。在仿真平台的高速公路场景中,将所提方法与深度Q网络(DQN)及其改进方法进行测试比较。实验结果表明,在智能车辆训练过程中,所提方法在保证快速收敛的同时,使碰撞发生的次数相较于对比方法降低了47%~57%,有效提高了训练过程中的安全性。

关键词: 安全强化学习, 自动变道决策, 轨迹预测, 风险评估, 动作空间裁剪

CLC Number: