Irregular object grasping by soft robotic arm based on clipped proximal policy optimization algorithm

doi:10.11772/j.issn.1001-9081.2023111712

Abstract

Abstract:

In order to cope with the problem of poor algorithm stability and learning rate of traditional Deep Reinforcement Learning （DRL） algorithms in processing complex scenes， especially in irregular object grasping and soft robotic arm applications， a soft robotic arm control strategy based on Clipped Proximal Policy Optimization （CPPO） algorithm was proposed. By introducing a clipping function， the performance of Proximal Policy Optimization （PPO） algorithm was optimized， which improved the stability and learning efficiency in high-dimensional state space. Firstly， the state space and action space of soft robotic arm were defined， and a soft robotic arm model imitating the tentacles of octopus was designed. Secondly， Matlab's toolbox SoRoSim （Soft Robot Simulation） was used for modeling， and an environmental reward function that combines continuous and sparse functions was defined. Finally， a simulation platform based on Matlab was constructed， the irregular object images were preprocessed through Python scripts and filters， and the Redis cache was used to efficiently transmit the processed contour data to the simulation platform. Comparative experimental results with TRPO （Trust Region Policy Optimization） and SAC （Soft Actor-Critic） algorithms show that CPPO algorithm achieves a success rate of 86.3% in the task of grasping irregular objects by soft robotic arm， which is higher than that of TRPO algorithm by 3.6 percentage points. It indicates that CPPO algorithm can be applied to control of soft robotic arms and can provide an important reference for the application of soft robotic arms in complex grasping tasks in unstructured environments.

Key words: Deep Reinforcement Learning (DRL), Proximal Policy Optimization (PPO) algorithm, irregular object detection, soft robotic arm, robotic arm grasping

摘要：

为应对传统深度强化学习（DRL）算法在处理复杂场景，特别是在不规则物体抓取和软体机械臂应用中算法稳定性和学习率较差的问题，提出一种基于裁剪近端策略优化（CPPO）算法的软体机械臂控制策略。通过引入裁剪函数，该算法优化了近端策略优化（PPO）算法的性能，提升了它在高维状态空间的稳定性和学习效率。首先定义了软体机械臂的状态空间和动作空间，并设计了模仿八爪鱼触手的软体机械臂模型；其次利用Matlab的SoRoSim （Soft Robot Simulation）工具箱进行建模，同时定义了结合连续和稀疏函数的环境奖励函数；最后构建了基于Matlab的仿真平台，通过Python脚本和滤波器对不规则物体图像进行预处理，并利用Redis缓存高效传输处理后的轮廓数据至仿真平台。与TRPO （Trust Region Policy Optimization）和SAC （Soft Actor-Critic）算法的对比实验结果表明，CPPO算法在软体机械臂抓取不规则物体任务中达到了86.3%的成功率，比TRPO算法高出了3.6个百分点。这说明CPPO算法可以应用于软体机械臂控制，可在非结构化环境下为软体机械臂在复杂抓取任务中的应用提供重要参考。

关键词: 深度强化学习, 近端策略优化算法, 不规则物体检测, 软体机械臂, 机械臂抓取

CLC Number:

TP181

Jiachen YU, Ye YANG. Irregular object grasping by soft robotic arm based on clipped proximal policy optimization algorithm[J]. Journal of Computer Applications, 2024, 44(11): 3629-3638.

余家宸, 杨晔. 基于裁剪近端策略优化算法的软机械臂不规则物体抓取[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3629-3638.

Figures/Tables 16

Fig. 1 Process of CPPO algorithm

Fig. 2 Representation of irregular object cross-section in Cartesian coordinate system

Fig. 3 Determination diagram of successful grasping of irregular objects

Fig. 4 Schematic diagram of state space design of soft robotic arm

Fig. 5 Schematic diagram of rotation direction angle of joint point

Fig. 6 Schematic diagram of the second type of action （joint point rotation）

Fig. 7 General flow of soft robotic arm grasping irregular object experiment

Fig. 8 Edge detection examples of irregular object images

Fig. 9 Examples of target object grasping point determination

Tab. 1 Setting of key initial parameters of CPPO algorithm

参数	取值
采样时间	1 024个时间步
Actor网络学习率	0.1
Critic网络学习率	0.1
奖励折扣因子	0.99
裁剪因子	0.2
迭代周期	3
小批量更新规模	16
最大episode数	1 000
每个episode的最大步长	200
得分平均窗口长度	5

Fig. 10 SoRoSim soft robotic arm simulation environment

Fig. 11 Projection of target object and soft robotic arm on X?Z plane

Fig. 12 Average environmental rewards of three algorithms within 500 episodes

Fig. 13 Performance comparison of three algorithms when mini-batch update size is 128

Tab.2 Training results of three reinforcement learning algorithms under various experimental conditions

算法	小批量更新规模	最大episode数	平均奖励	平均奖励极限近似值	平均奖励方差	训练时间/h
CPPO	16	500	-2.039 9	-0.5	1.384 6	0.90
TRPO	16	500	-2.142 7	-0.8	1.397 1	0.92
SAC	16	500	-7.126 7	-6.2	1.847 2	0.99
CPPO	128	500	-1.731 9	-0.4	1.212 1	0.52
TRPO	128	500	-1.801 3	-0.9	1.408 7	0.58
SAC	128	500	-7.318 6	-6.3	2.249 5	0.62
CPPO	128	1 000	-1.643 8	-0.3	1.206 4	0.92
TRPO	128	1 000	-1.879 7	-0.6	1.417 5	0.98
SAC	128	1 000	-6.038 9	-6.0	1.792 2	1.02

Tab.3 Grasping success rates of three reinforcement learning algorithms

算法	抓取成功数	抓取成功率/%
CPPO	259	86.3
TRPO	248	82.7
SAC	220	73.3

References 29

1	GU S， HOLLY E， LILLICRAP T， et al. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates［C］// Proceedings of the 2017 IEEE International Conference on Robotics and Automation. Piscataway： IEEE， 2017： 3389-3396.
2	RAJESWARAN A， KUMAR V， GUPTA A， et al. Learning complex dexterous manipulation with deep reinforcement learning and demonstrations［EB/OL］. ［2024-03-05］..
3	DEISENROTH M P， NEUMANN G， PETERS J. A survey on policy search for robotics［J］. Foundations and Trends in Robotics， 2013， 2（1/2）： 1-142.
4	DEISENROTH M P， RASMUSSEN C E. PILCO： a model-based and data-efficient approach to policy search［C］// Proceedings of the 28th International Conference on Machine Learning. Madison， WI： Omnipress， 2011： 465-472.
5	LEVINE S， ABBEEL P. Learning neural network policies with guided policy search under unknown dynamics［C］// Proceedings of the 27th International Conference on Neural Information Processing Systems — Volume 1. Cambridge： MIT Press， 2014： 1071-1079.
6	CHEBOTAR Y， KALAKRISHNAN M， YAHYA A， et al. Path integral guided policy search［C］// Proceedings of the 2017 IEEE International Conference on Robotics and Automation. Piscataway： IEEE， 2017： 3381-3388.
7	LEVINE S， PASTOR P， KRIZHEVSKY A， et al. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection［J］. The International Journal of Robotics Research， 2018， 37（4/5）： 421-436.
8	KALASHNIKOV D， IRPAN A， PASTOR P， et al. Scalable deep reinforcement learning for vision-based robotic manipulation［C］// Proceedings of the 2nd Conference on Robot Learning. New York： JMLR.org， 2018： 651-673.
9	QUILLEN D， JANG E， NACHUM O， et al. Deep reinforcement learning for vision-based robotic grasping： a simulated comparative evaluation of off-policy methods［C］// Proceedings of the 2018 IEEE International Conference on Robotics and Automation. Piscataway： IEEE， 2018： 6284-6291.
10	MARTÍN-MARTÍN R， LEE M A， GARDNER R， et al. Variable impedance control in end-effector space： an action space for reinforcement learning in contact-rich tasks［C］// Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway： IEEE， 2019： 1010-1017.
11	LENZ I， LEE H， SAXENA A. Deep learning for detecting robotic grasps［J］. The International Journal of Robotics Research， 2015， 34（4/5）： 705-724.
12	VIERECK U， TEN PAS A， SAENKO K， et al. Learning a visuomotor controller for real world robotic grasping using simulated depth images［C］// Proceedings of the 1st Annual Conference on Robot Learning. New York： JMLR.org， 2017： 291-300.
13	SONG S， ZENG A， LEE J， et al. Grasping in the wild： learning 6DoF closed-loop grasping from low-cost demonstrations［J］. IEEE Robotics and Automation Letters， 2020， 5（3）： 4978-4985.
14	SMITH L， KEW J C， PENG X B， et al. Legged robots that keep on learning： fine-tuning locomotion policies in the real world［C］// Proceedings of the 2022 IEEE International Conference on Robotics and Automation. Piscataway： IEEE， 2022： 1593-1599.
15	TRIVEDI D， LOTFI A， RAHN C D. Geometrically exact models for soft robotic manipulators［J］. IEEE Transactions on Robotics， 2008， 24（4）： 773-780.
16	PAI D K. STRANDS： interactive simulation of thin solids using Cosserat models［J］. Computer Graphics Forum， 2002， 21（3）： 347-352.
17	RENDA F， GIORGIO-SERCHI F， BOYER F， et al. Modelling cephalopod-inspired pulsed-jet locomotion for underwater soft robots［J］. Bioinspiration and Biomimetics， 2015， 10（5）： No.055005.
18	RENDA F， GIORGIO-SERCHI F， BOYER F， et al. A unified multi-soft-body dynamic model for underwater soft robots［J］. The International Journal of Robotics Research， 2018， 37（6）： 648-666.
19	DAS R， BABU S P M， VISENTIN F， et al. An earthworm-like modular soft robot for locomotion in multi-terrain environments［J］. Scientific Reports volume， 2023， 13： No.1571.
20	WANG X， KANG H， ZHOU H， et al. Development and evaluation of a robust soft robotic gripper for apple harvesting［J］. Computers and Electronics in Agriculture， 2023， 204： No.107552.
21	YOON Y， PARK H， LEE J， et al. Bioinspired untethered soft robot with pumpless phase change soft actuators by bidirectional thermoelectrics［J］. Chemical Engineering Journal， 2023， 451（Pt 3）： No.138794.
22	DURIEZ C. Control of elastic soft robots based on real-time finite element method［C］// Proceedings of the 2013 IEEE International Conference on Robotics and Automation. Piscataway： IEEE， 2013： 3982-3987.
23	BARTLETT N W， TOLLEY M T， OVERVELDE J T B， et al. A 3D-printed， functionally graded soft robot powered by combustion［J］. Science， 2015， 349（6244）：161-165.
24	POLYGERINOS P， WANG Z， OVERVELDE J T B， et al. Modeling of soft fiber-reinforced bending actuators［J］. IEEE Transactions on Robotics， 2015， 31（3）： 778-789.
25	THURUTHEL T G， FALOTICO E， MANTI M， et al. Learning closed loop kinematic controllers for continuum manipulators in unstructured environments［J］. Soft Robotics， 2017， 4（3）：285-296.
26	SCHULMAN J， WOLSKI F， DHARIWAL P， et al. Proximal policy optimization algorithms［EB/OL］. ［2024-01-22］..
27	HASEGAWA S， YAMAGUCHI N， OKADA K， et al. Online acquisition of close-range proximity sensor models for precise object grasping and verification［J］. IEEE Robotics and Automation Letters， 2020， 5（4）： 5993-6000.
28	VECERIK M， HESTER T， SCHOLZ J， et al. Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards［EB/OL］.［2024-04-11］..
29	OneOneLiu， GGCNN Cornell grasp dataset［DS/OL］. ［2023-12-05］..

[1]	Xiaoyan ZHAO, Wei HAN, Junna ZHANG, Peiyan YUAN. Collaborative offloading strategy in internet of vehicles based on asynchronous deep reinforcement learning [J]. Journal of Computer Applications, 2024, 44(5): 1501-1510.
[2]	Tengfei CAO, Yanliang LIU, Xiaoying WANG. Edge computing and service offloading algorithm based on improved deep reinforcement learning [J]. Journal of Computer Applications, 2023, 43(5): 1543-1550.
[3]	Yu XU, Yunyou ZHU, Xiao LIU, Yuting DENG, Yong LIAO. Multi-objective routing optimization of electric power material distribution based on deep reinforcement learning [J]. Journal of Computer Applications, 2022, 42(10): 3252-3258.