Journal of Computer Applications ›› 2021, Vol. 41 ›› Issue (6): 1799-1804.DOI: 10.11772/j.issn.1001-9081.2020091410

Special Issue: 前沿与综合应用

• Frontier and comprehensive applications • Previous Articles     Next Articles

Motion control method of two-link manipulator based on deep reinforcement learning

WANG Jianping, WANG Gang, MAO Xiaobin, MA Enqi   

  1. School of Mechanical and Precision Instrument Engineering, Xi'an University of Technology, Xi'an Shaanxi 710048, China
  • Received:2020-09-11 Revised:2020-12-15 Online:2021-06-10 Published:2020-12-28


王建平, 王刚, 毛晓彬, 马恩琪   

  1. 西安理工大学 机械与精密仪器工程学院, 西安 710048
  • 通讯作者: 王刚
  • 作者简介:王建平(1970-),男,山西代县人,副教授,博士,主要研究方向:非线性系统动力学、智能控制;王刚(1996-),男,陕西宝鸡人,硕士研究生,主要研究方向:智能控制、深度强化学习;毛晓彬(1998-),男,山西临汾人,硕士研究生,主要研究方向:智能控制;马恩琪(1998-),男,陕西渭南人,硕士研究生,主要研究方向:智能控制。

Abstract: Aiming at the motion control problem of two-link manipulator, a new control method based on deep reinforcement learning was proposed. Firstly, the simulation environment of manipulator was built, which includes the two-link manipulator, target and obstacle. Then, according to the target setting, state variables as well as reward and punishment mechanism of the environment model, three kinds of deep reinforcement learning models were established for training. Finally, the motion control of the two-link manipulator was realized. After comparing and analyzing the three proposed models, Deep Deterministic Policy Gradient (DDPG) algorithm was selected for further research to improve its applicability, so as to shorten the debugging time of the manipulator model, and avoided the obstacle to reach the target smoothly. Experimental results show that, the proposed deep reinforcement learning method can effectively control the motion of two-link manipulator, the improved DDPG algorithm control model has the convergence speed increased by two times and the stability after convergence enhances. Compared with the traditional control method, the proposed deep reinforcement learning control method has higher efficiency and stronger applicability.

Key words: deep reinforcement learning, two-link manipulator, motion control, reward and punishment mechanism, Deep Deterministic Policy Gradient (DDPG) algorithm

摘要: 针对二连杆机械臂的运动控制问题,提出了一种基于深度强化学习的控制方法。首先,搭建机械臂仿真环境,包括二连杆机械臂、目标物与障碍物;然后,根据环境模型的目标设置、状态变量和奖罚机制来建立三种深度强化学习模型进行训练,最后实现二连杆机械臂的运动控制。对比分析所提出的三种模型后,选择深度确定性策略梯度(DDPG)算法进行进一步研究来改进其适用性,从而缩短机械臂模型的调试时间,顺利避开障碍物到达目标。实验结果表明,所提深度强化学习方法能够有效控制二连杆机械臂的运动,改进后的DDPG算法控制模型的收敛速度提升了两倍并且收敛后的稳定性增强。相较于传统控制方法,所提深度强化学习控制方法效率更高,适用性更强。

关键词: 深度强化学习, 二连杆机械臂, 运动控制, 奖罚机制, 深度确定性策略梯度算法

CLC Number: