Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (2): 445-451.DOI: 10.11772/j.issn.1001-9081.2023020153

• Artificial intelligence • Previous Articles    

Gait control method based on maximum entropy deep reinforcement learning for biped robot

Yuanchao LI1, Chongben TAO1,2(), Chen WANG1   

  1. 1.School of Electronic and Information Engineering,Suzhou University of Science and Technology,Suzhou Jiangsu 215009,China
    2.Suzhou Automotive Research Institute,Tsinghua University,Suzhou Jiangsu 215134,China
  • Received:2023-02-21 Revised:2023-04-20 Accepted:2023-05-05 Online:2023-08-14 Published:2024-02-10
  • Contact: Chongben TAO
  • About author:LI Yuanchao, born in 1999, M. S. candidate. His research interests include artificial intelligence, biped robot motion control.
    WANG Chen, born in 1990, Ph. D., lecturer. His research interests include biped robot motion control.
  • Supported by:
    National Natural Science Foundation of China(62201375);China Postdoctoral Science Foundation(2021M691848);Natural Science Foundation of Jiangsu Province(BK20220635);Science and Technology Project of Suzhou(SS2019029)


李源潮1, 陶重犇1,2(), 王琛1   

  1. 1.苏州科技大学 电子与信息工程学院,江苏 苏州 215009
    2.清华大学 苏州汽车研究院,江苏 苏州 215134
  • 通讯作者: 陶重犇
  • 作者简介:李源潮(1999—),男,江苏连云港人,硕士研究生,主要研究方向:人工智能、双足机器人运动控制
  • 基金资助:


For the problem of gait stability control for continuous linear walking of a biped robot, a Soft Actor-Critic (SAC) gait control algorithm based on maximum entropy Deep Reinforcement Learning (DRL) was proposed. Firstly, without accurate robot dynamic model built in advance, all parameters were derived from joint angles without additional sensors. Secondly, the cosine similarity method was used to classify experience samples and optimize the experience replay mechanism. Finally, reward functions were designed based on knowledge and experience to enable the biped robot continuously adjust its attitude during the linear walking training process, and the reward functions ensured the robustness of straight walking. The proposed method was compared with other DRL methods such as PPO (Proximal Policy Optimization) and TRPO (Trust Region Policy Optimization) in Roboschool simulation environment. The results show that the proposed method not only achieves fast and stable linear walking of the biped robot, but also has better algorithmic robustness.

Key words: biped robot, gait control, deep reinforcement learning, maximum entropy, Soft Actor-Critic (SAC) algorithm



关键词: 双足机器人, 步态控制, 深度强化学习, 最大熵, 柔性演员-评论家算法

CLC Number: