计算机应用 ›› 2018, Vol. 38 ›› Issue (9): 2459-2463.DOI: 10.11772/j.issn.1001-9081.2018030714

• 人工智能 • 上一篇    下一篇

基于deep Q-network双足机器人非平整地面行走稳定性控制方法

赵玉婷1, 韩宝玲1, 罗庆生2   

  1. 1. 北京理工大学 机械与车辆学院, 北京 100081;
    2. 北京理工大学 机电学院, 北京 100081
  • 收稿日期:2018-04-09 修回日期:2018-05-18 出版日期:2018-09-10 发布日期:2018-09-06
  • 通讯作者: 韩宝玲
  • 作者简介:赵玉婷(1993—),女,河北石家庄人,硕士研究生,主要研究方向:机器人智能控制策略、仿生机器人;韩宝玲(1957—),女,安徽合肥人,教授,博士,主要研究方向:仿生机器人、光机电一体化设计、机械CAD;罗庆生(1956—),男,湖南汉寿人,教授,博士,主要研究方向:特种机器人、机电一体化。
  • 基金资助:
    国家部委重点预研基金资助项目(3020020221111)。

Walking stability control method based on deep Q-network for biped robot on uneven ground

ZHAO Yuting1, HAN Baoling1, LUO Qingsheng2   

  1. 1. School of Mechanical Engineering, Beijing Institute of Technology, Beijing 100081, China;
    2. School of Mechatronical Engineering, Beijing Institute of Technology, Beijing 100081, China
  • Received:2018-04-09 Revised:2018-05-18 Online:2018-09-10 Published:2018-09-06
  • Contact: 韩宝玲
  • Supported by:
    This work is partially supported by the National Ministerial Level Advanced Research Foundation (3020020221111).

摘要: 针对双足机器人在非平整地面行走时容易失去运动稳定性的问题,提出一种基于一种基于价值的深度强化学习算法DQN(Deep Q-Network)的步态控制方法。首先通过机器人步态规划得到针对平整地面环境的离线步态,然后将双足机器人视为一个智能体,建立机器人环境空间、状态空间、动作空间及奖惩机制,该过程与传统控制方法相比无需复杂的动力学建模过程,最后经过多回合训练使双足机器人学会在不平整地面进行姿态调整,保证行走稳定性。在V-Rep仿真环境中进行了算法验证,双足机器人在非平整地面行走过程中,通过DQN步态调整学习算法,姿态角度波动范围在3°以内,结果表明双足机器人行走稳定性得到明显改善,实现了机器人的姿态调整行为学习,证明了该方法的有效性。

关键词: 双足机器人, 行走稳定性, 步态控制, 非平整地面, 强化学习

Abstract: Aiming at the problem that biped robots may easily lose their motion stability when walking on uneven ground, a value-based deep reinforcement learning algorithm called Deep Q-Network (DQN) gait control method was proposed, which is an intelligent learning method of posture adjustment. Firstly, an off-line gait for a flat ground environment was obtained through the gait planning of the robot. Secondly, instead of implementing a complex dynamic model compared to traditional control methods, a bipedal robot was regarded as an agent to establish robot environment space, state space, action space and Reward-Punishment (RP) mechanism. Finally, through multiple rounds of training, the biped robot learned to adjust its posture on the uneven ground and ensures the stability of walking. The performance and effectiveness of the proposed algorithm was validated in a V-Rep simulation environment. The results demonstrate that the biped robot's lateral tile angle is less than 3° after implementing the proposed method and the walking stability is improved obviously, which achieves the robot's posture adjustment behavior learning and proves the effectiveness of the method.

Key words: biped robot, walking stability, gait control, uneven ground, reinforcement learning

中图分类号: