Motion control method of two-link manipulator based on deep reinforcement learning

doi:10.11772/j.issn.1001-9081.2020091410

Abstract

Abstract: Aiming at the motion control problem of two-link manipulator, a new control method based on deep reinforcement learning was proposed. Firstly, the simulation environment of manipulator was built, which includes the two-link manipulator, target and obstacle. Then, according to the target setting, state variables as well as reward and punishment mechanism of the environment model, three kinds of deep reinforcement learning models were established for training. Finally, the motion control of the two-link manipulator was realized. After comparing and analyzing the three proposed models, Deep Deterministic Policy Gradient (DDPG) algorithm was selected for further research to improve its applicability, so as to shorten the debugging time of the manipulator model, and avoided the obstacle to reach the target smoothly. Experimental results show that, the proposed deep reinforcement learning method can effectively control the motion of two-link manipulator, the improved DDPG algorithm control model has the convergence speed increased by two times and the stability after convergence enhances. Compared with the traditional control method, the proposed deep reinforcement learning control method has higher efficiency and stronger applicability.

Key words: deep reinforcement learning, two-link manipulator, motion control, reward and punishment mechanism, Deep Deterministic Policy Gradient (DDPG) algorithm

摘要： 针对二连杆机械臂的运动控制问题，提出了一种基于深度强化学习的控制方法。首先，搭建机械臂仿真环境，包括二连杆机械臂、目标物与障碍物；然后，根据环境模型的目标设置、状态变量和奖罚机制来建立三种深度强化学习模型进行训练，最后实现二连杆机械臂的运动控制。对比分析所提出的三种模型后，选择深度确定性策略梯度（DDPG）算法进行进一步研究来改进其适用性，从而缩短机械臂模型的调试时间，顺利避开障碍物到达目标。实验结果表明，所提深度强化学习方法能够有效控制二连杆机械臂的运动，改进后的DDPG算法控制模型的收敛速度提升了两倍并且收敛后的稳定性增强。相较于传统控制方法，所提深度强化学习控制方法效率更高，适用性更强。

关键词: 深度强化学习, 二连杆机械臂, 运动控制, 奖罚机制, 深度确定性策略梯度算法

CLC Number:

WANG Jianping, WANG Gang, MAO Xiaobin, MA Enqi. Motion control method of two-link manipulator based on deep reinforcement learning[J]. Journal of Computer Applications, 2021, 41(6): 1799-1804.

王建平, 王刚, 毛晓彬, 马恩琪. 基于深度强化学习的二连杆机械臂运动控制方法[J]. 计算机应用, 2021, 41(6): 1799-1804.

References

[1] SOLTANPOUR M R, KHOOBAN M H. A particle swarm optimization approach for fuzzy sliding mode control for tracking the robot manipulator[J]. Nonlinear Dynamics,2013,74(1/2):467-478.
[2] OLIVEIRA J,OLIVEIRA P M,BOAVENTURA-CUNHA J,et al. Chaos-based grey wolf optimizer for higher order sliding mode position control of a robotic manipulator[J]. Nonlinear Dynamics, 2017,90(2):1353-1362.
[3] WANG Z,LIU X,LIU K,et al. Backstepping-based Lyapunov function construction using approximate dynamic programming and sum of square techniques[J]. IEEE Transactions on Cybernetics, 2017,47(10):3393-3403.
[4] LU E,YANG X,LI W,et al. Tip position control of single flexible manipulators based on LQR with the Mamdani model[J]. Journal of Vibroengineering,2016,18(6):3695-3708.
[5] YIN X,WANG H,WU G. Path planning algorithm for bending robots[C]//Proceedings of the 2009 IEEE International Conference on Robotics and Biomimetics. Piscataway:IEEE,2009:392-395.
[6] LI X,YANG G. Adaptive decentralized control for a class of interconnected nonlinear systems via backstepping approach and graph theory[J]. Automatica,2017,76:87-95.
[7] NGO T,WANG Y,MAI T L,et al. Robust adaptive neural-fuzzy network tracking control for robot manipulator[J]. International Journal of Computers Communications and Control,2012,7(2):341-352.
[8] KORMUSHEV P, CALINON S, CALDWELL D G. Imitation learning of positional and force skills demonstrated via kinesthetic teaching and haptic input[J]. Advanced Robotics,2011,25(5):581-603.
[9] ZHANG F,LEITNER J,MILFORD M,et al. Towards visionbased deep reinforcement learning for robotic motion control[EB/OL].[2020-09-05]. https://arxiv.org/pdf/1511.03791.pdf.
[10] 李铭浩, 张华, 刘满禄, 等. 基于深度强化学习的机械臂容错控制方法[J]. 传感器与微系统, 2020, 39(1):53-55, 59.(LI M H, ZHANG H,LIU M L,et al. Fault tolerant control method of manipulator based on deep reinforcement learning[J]. Transducer and Microsystem Technologies,2020,39(1):53-55,59.)
[11] MNIH V,BADIA A P,MIRZA M,et al. Asynchronous methods for deep reinforcement learning[C]//Proceedings of the 2016 33rd International Conference on Machine Learning. New York:JMLR. org,2016:1928-1937.
[12] 刘成亮, 戈新生. 一类二连杆欠驱动机器人系统的稳定控制[J]. 北京信息科技大学学报(自然科学版), 2017, 32(3):25-29.(LIU C L,GE X S. Stability control to a kind of two-link underactuated robot system[J]. Journal of Beijing Information Science & Technology University,2017,32(3):25-29.)
[13] 万仁卓, 王思源, 冯绎铭, 等. 基于二连杆任务的深度强化学习算法分析与比较[J]. 湖北科技学院学报, 2019, 39(3):151-156.(WAN R Z,WANG S Y,FENG Y M,et al. Analysis and comparison of deep reinforcement learning algorithms based on twolink task[J]. Journal of Hubei University of Science and Technology,2019,39(3):151-156.)
[14] MNIH V,KAVUKCUOGLU K,SILVER D,et al. Playing Atari with deep reinforcement learning[EB/OL].[2020-09-05]. https://arxiv.org/pdf/1312.5602.pdf.
[15] LILLICRAP T P,HUNT J J,PRITZEL A,et al. Continuous control with deep reinforcement learning[EB/OL].[2020-09-05]. https://arxiv.org/pdf/1509.02971v2.pdf.
[16] SCHULMAN J,WOLSKI F,DHARIWAL P,et al. Proximal policy optimization algorithms[EB/OL].[2020-09-05]. https://arxiv.org/pdf/1707.06347.pdf.