Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (11): 3629-3638.DOI: 10.11772/j.issn.1001-9081.2023111712

• Frontier and comprehensive applications • Previous Articles     Next Articles

Irregular object grasping by soft robotic arm based on clipped proximal policy optimization algorithm

Jiachen YU1,2, Ye YANG1,2()   

  1. 1.College of Information,Mechanical and Electrical Engineering,Shanghai Normal University,Shanghai 201418,China
    2.Shanghai Engineering Research Center of Intelligent Education and Bigdata (Shanghai Normal University),Shanghai 200234,China
  • Received:2023-12-11 Revised:2024-03-09 Accepted:2024-03-14 Online:2024-03-22 Published:2024-11-10
  • Contact: Ye YANG
  • About author:YU Jiachen, born in 1998, M. S. His research interests include deep reinforcement learning algorithms, intelligent robot control.
  • Supported by:
    National Natural Science Foundation of China(51605298)

基于裁剪近端策略优化算法的软机械臂不规则物体抓取

余家宸1,2, 杨晔1,2()   

  1. 1.上海师范大学 信息与机电工程学院,上海 201418
    2.上海智能教育大数据工程技术研究中心(上海师范大学),上海 200234
  • 通讯作者: 杨晔
  • 作者简介:余家宸(1998—),男,浙江舟山人,硕士,CCF会员,主要研究方向:深度强化学习算法、机器人智能控制
  • 基金资助:
    国家自然科学基金资助项目(51605298)

Abstract:

In order to cope with the problem of poor algorithm stability and learning rate of traditional Deep Reinforcement Learning (DRL) algorithms in processing complex scenes, especially in irregular object grasping and soft robotic arm applications, a soft robotic arm control strategy based on Clipped Proximal Policy Optimization (CPPO) algorithm was proposed. By introducing a clipping function, the performance of Proximal Policy Optimization (PPO) algorithm was optimized, which improved the stability and learning efficiency in high-dimensional state space. Firstly, the state space and action space of soft robotic arm were defined, and a soft robotic arm model imitating the tentacles of octopus was designed. Secondly, Matlab's toolbox SoRoSim (Soft Robot Simulation) was used for modeling, and an environmental reward function that combines continuous and sparse functions was defined. Finally, a simulation platform based on Matlab was constructed, the irregular object images were preprocessed through Python scripts and filters, and the Redis cache was used to efficiently transmit the processed contour data to the simulation platform. Comparative experimental results with TRPO (Trust Region Policy Optimization) and SAC (Soft Actor-Critic) algorithms show that CPPO algorithm achieves a success rate of 86.3% in the task of grasping irregular objects by soft robotic arm, which is higher than that of TRPO algorithm by 3.6 percentage points. It indicates that CPPO algorithm can be applied to control of soft robotic arms and can provide an important reference for the application of soft robotic arms in complex grasping tasks in unstructured environments.

Key words: Deep Reinforcement Learning (DRL), Proximal Policy Optimization (PPO) algorithm, irregular object detection, soft robotic arm, robotic arm grasping

摘要:

为应对传统深度强化学习(DRL)算法在处理复杂场景,特别是在不规则物体抓取和软体机械臂应用中算法稳定性和学习率较差的问题,提出一种基于裁剪近端策略优化(CPPO)算法的软体机械臂控制策略。通过引入裁剪函数,该算法优化了近端策略优化(PPO)算法的性能,提升了它在高维状态空间的稳定性和学习效率。首先定义了软体机械臂的状态空间和动作空间,并设计了模仿八爪鱼触手的软体机械臂模型;其次利用Matlab的SoRoSim (Soft Robot Simulation)工具箱进行建模,同时定义了结合连续和稀疏函数的环境奖励函数;最后构建了基于Matlab的仿真平台,通过Python脚本和滤波器对不规则物体图像进行预处理,并利用Redis缓存高效传输处理后的轮廓数据至仿真平台。与TRPO (Trust Region Policy Optimization)和SAC (Soft Actor-Critic)算法的对比实验结果表明,CPPO算法在软体机械臂抓取不规则物体任务中达到了86.3%的成功率,比TRPO算法高出了3.6个百分点。这说明CPPO算法可以应用于软体机械臂控制,可在非结构化环境下为软体机械臂在复杂抓取任务中的应用提供重要参考。

关键词: 深度强化学习, 近端策略优化算法, 不规则物体检测, 软体机械臂, 机械臂抓取

CLC Number: