《计算机应用》唯一官方网站

• •    下一篇

基于裁剪近端策略优化算法的软机械臂不规则物体抓取

余家宸,杨晔   

  1. 上海师范大学
  • 收稿日期:2023-12-11 修回日期:2024-03-09 接受日期:2024-03-14 发布日期:2024-03-22 出版日期:2024-03-22
  • 通讯作者: 杨晔
  • 作者简介:余家宸(1998—),男,浙江舟山人,硕士,CCF 会员,主要研究方向:深度强化学习算法、机器人智能控制;杨晔 (1985—),女,上海人,副教授,博士,CCF 会员,主要研究方向:强化学习、智能制造。
  • 基金资助:
    国家自然科学基金资助项目(51605298)

Irregular object grasping by soft robotic arm based on Clipped Proximal Policy Optimization algorithm

  • Received:2023-12-11 Revised:2024-03-09 Accepted:2024-03-14 Online:2024-03-22 Published:2024-03-22
  • Contact: Ye /Yang
  • About author:YU Jiachen, born in 1998, M. S. candidate. His research interests include deep reinforcement learning algorithms, robot intelligent control. YANG Ye, born in 1985, Ph. D., associate professor. Her research interests include reinforcement learning, robot control, intelligent manufacturing.
  • Supported by:
    This work is partially supported by National Natural Science Foundation of China (51605298).

摘要: 为应对传统深度强化学习算法在处理复杂场景,特别是在不规则物体抓取和软体机械臂应用中算法稳定性和学习率较差的问题,提出一种基于裁剪近端策略优化(CPPO)算法的软体机械臂控制策略。通过引入裁剪函数,该算法优化了近端策略优化(PPO)算法的性能,提升了它在高维状态空间的稳定性和学习效率。首先定义了软体机械臂的状态空间和动作空间,并设计了模仿八爪鱼触手的软体机械臂模型;其次利用 Matlab SoRoSim 工具箱进行建模,同时定义了结合连续和稀疏函数的环境奖励函数;最后构建了基于 Matlab 的仿真平台,通过 Python 脚本和滤波器对不规则物体图像进行预处理,利用 Redis缓存高效传输处理后的轮廓数据至仿真平台。与 TRPO(Trust Region Policy Optimization)算法和 SAC(Soft Actor-Critic)算法的对比实验表明,CPPO 算法在软体机械臂抓取不规则物体任务中达到了 86.3%的成功率,相比 TRPO 算法高出了约 3.6%,具有更优的性能。这说明 CPPO 算法可以应用于软体机械臂控制,可在非结构化环境为软体机械臂在复杂抓取任务中的应用提供重要参考。

关键词: 深度强化学习(DRL), 近端策略优化算法(PPO), 不规则物体检测, 软体机械臂, 机械臂抓取

Abstract: In order to cope with the performance limitations of traditional deep reinforcement learning algorithms in processing complex scenes, especially the performance limitations in irregular object grabbing and soft manipulator applications, poor algorithm stability and learning rate issues, a soft robotic arm control based on the Cropped Proximal Policy Optimization (CPPO) algorithm was proposed. By introducing a clipping function, this algorithm optimized the performance of the proximal policy optimization (PPO) algorithm and improved its stability and learning efficiency in high-dimensional state space. First, the state space and action space of the soft robotic arm were defined, and a soft robotic arm model imitating the tentacles of the octopus was designed. Secondly, Matlab's SoRoSim toolbox was used for modeling, and an environmental reward function that combines continuous and sparse functions was defined. Finally, a simulation platform based on Matlab was constructed. The irregular object images were preprocessed through Python scripts and filters, and Redis cache was used to efficiently transmit the processed contour data to the simulation platform. Comparative experiments with the TRPO (Trust Region Policy Optimization) algorithm and the SAC (Soft Actor-Critic) algorithm shew that the CPPO algorithm achieved a success rate of 86.3% in the task of grabbing irregular objects by the soft manipulator, which was higher than the TRPO algorithm. increased by about 3.6%, with better performance. It is shown that the CPPO algorithm can be applied to the control of soft manipulators, and can provide an important reference for the application of soft manipulators in complex grasping tasks in unstructured environments.

Key words: Deep Reinforcement Learning (DRL), Proximal Policy Optimization (PPO), Irregular object detection, Soft robotic arm, Robotic arm grasping

中图分类号: