Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (11): 3666-3673.DOI: 10.11772/j.issn.1001-9081.2024111654

• Advanced computing • Previous Articles    

Spatial-temporal Transformer-based hybrid return implicit Q-learning for crowd navigation

Shuai ZHOU1,2, Hao FU1,2(), Wei LIU1,2   

  1. 1.School of Computer Science and Technology,Wuhan University of Science and Technology,Wuhan Hubei 430065,China
    2.Hubei Province Key Laboratory of Intelligent Information Processing and Real Time Industrial System,Wuhan Hubei 430081,China
  • Received:2024-11-27 Revised:2025-03-31 Accepted:2025-04-08 Online:2025-04-22 Published:2025-11-10
  • Contact: Hao FU
  • About author:ZHOU Shuai, born in 2000, M. S. candidate. His research interests include offline reinforcement learning, intelligent robot.
    LIU Wei, born in 1998, M. S. candidate. His research interests include multi-robot intelligent control.
  • Supported by:
    National Natural Science Foundation of China(62173262);Hubei Provincial Natural Science Foundation(2023AFB109)

基于时空Transformer的混合回报隐式Q学习人群导航

周帅1,2, 符浩1,2(), 刘伟1,2   

  1. 1.武汉科技大学 计算机科学与技术学院,武汉 430065
    2.湖北省智能信息处理与实时工业系统重点实验室,武汉 430081
  • 通讯作者: 符浩
  • 作者简介:周帅(2000—),男,湖北天门人,硕士研究生,主要研究方向:离线强化学习、智能机器人
    刘伟(1998—),男,湖北黄冈人,硕士研究生,主要研究方向:多机器人智能控制。
  • 基金资助:
    国家自然科学基金资助项目(62173262);国家自然科学基金资助项目(62303357);湖北省自然科学基金资助项目(2023AFB109)

Abstract:

In crowded environments, robots typically utilize online reinforcement learning algorithms to perform crowd navigation tasks. However, the complex and dynamic characteristics of pedestrian movements significantly reduce the sample efficiency of online reinforcement learning. To address this issue, a Spatial-temporal Transformer-based Hybrid Return Implicit Q-Learning (STHRIQL) algorithm within Offline Reinforcement Learning (ORL) framework was proposed. Firstly, the Monte Carlo (MC) return mechanism was incorporated into the Implicit Q-Learning (IQL) algorithm to enhance the convergence of the learning process. Then, a spatial-temporal Transformer model was further integrated into the Actor-Critic framework, so as to effectively capture and analyze the highly dynamic and complex interactions between robots and pedestrians in offline crowd navigation datasets, thereby optimizing the training process and efficiency of the algorithm. Finally, simulation experiments were conducted to compare STHRIQL algorithm with existing online reinforcement learning-based crowd navigation algorithms, followed by quantitative and qualitative analyses based on evaluation metrics. Experimental results show that STHRIQL algorithm has superior performance in crowd navigation tasks, and improves sample efficiency by 30.5% - 55.8% compared to existing online crowd navigation algorithms. This indicates that the STHRIQL algorithm provides a new approach and solution for enhancing robot navigation capabilities in complex crowd environments.

Key words: crowd navigation, Deep Reinforcement Learning (DRL), offline learning, neural network, spatial-temporal Transformer

摘要:

在人群密集环境中,机器人执行人群导航任务时通常采用在线强化学习算法。然而,行人运动复杂多变的特性显著降低了在线强化学习的样本效率。针对这一问题,提出一种在离线强化学习(ORL)框架下的基于时空Transformer的混合回报隐式Q学习(STHRIQL)算法。首先,将蒙特卡洛(MC)回报机制融入隐式Q学习(IQL)算法中,旨在增强学习过程的收敛性;其次,进一步将时空Transformer模型整合至Actor-Critic中,以有效捕捉并解析离线人群导航数据集中机器人与行人之间高度动态且复杂的交互信息,从而优化算法的训练流程与效率;最后,通过仿真实验将所提算法与现有基于在线强化学习的人群导航算法进行对比,并根据评估机制进行定量与定性分析。实验结果显示,STHRIQL算法不仅在人群导航任务中展现出了优越的性能,而且相较于现有的在线人群导航算法,样本效率提升了30.5%~55.8%。STHRIQL算法可为提升机器人在复杂人群环境中的导航能力提供新的思路与解决方案。

关键词: 人群导航, 深度强化学习, 离线学习, 神经网络, 时空Transformer

CLC Number: