基于裁剪近端策略优化算法的软机械臂不规则物体抓取

doi:10.11772/j.issn.1001-9081.2023111712

《计算机应用》唯一官方网站

• • 下一篇

基于裁剪近端策略优化算法的软机械臂不规则物体抓取

余家宸,杨晔

上海师范大学

收稿日期:2023-12-11 修回日期:2024-03-09 接受日期:2024-03-14 发布日期:2024-03-22 出版日期:2024-03-22
通讯作者: 杨晔
作者简介:余家宸(1998—)，男，浙江舟山人，硕士，CCF 会员，主要研究方向：深度强化学习算法、机器人智能控制；杨晔 (1985—)，女，上海人，副教授，博士，CCF 会员，主要研究方向：强化学习、智能制造。
基金资助:
国家自然科学基金资助项目(51605298)。

Irregular object grasping by soft robotic arm based on Clipped Proximal Policy Optimization algorithm

Received:2023-12-11 Revised:2024-03-09 Accepted:2024-03-14 Online:2024-03-22 Published:2024-03-22
Contact: Ye /Yang
About author:YU Jiachen, born in 1998, M. S. candidate. His research interests include deep reinforcement learning algorithms, robot intelligent control. YANG Ye, born in 1985, Ph. D., associate professor. Her research interests include reinforcement learning, robot control, intelligent manufacturing.
Supported by:
This work is partially supported by National Natural Science Foundation of China (51605298).

摘要/Abstract

摘要： 为应对传统深度强化学习算法在处理复杂场景，特别是在不规则物体抓取和软体机械臂应用中算法稳定性和学习率较差的问题，提出一种基于裁剪近端策略优化(CPPO)算法的软体机械臂控制策略。通过引入裁剪函数，该算法优化了近端策略优化(PPO)算法的性能，提升了它在高维状态空间的稳定性和学习效率。首先定义了软体机械臂的状态空间和动作空间，并设计了模仿八爪鱼触手的软体机械臂模型；其次利用 Matlab 的 SoRoSim 工具箱进行建模，同时定义了结合连续和稀疏函数的环境奖励函数；最后构建了基于 Matlab 的仿真平台，通过 Python 脚本和滤波器对不规则物体图像进行预处理，利用 Redis缓存高效传输处理后的轮廓数据至仿真平台。与 TRPO(Trust Region Policy Optimization)算法和 SAC(Soft Actor-Critic)算法的对比实验表明，CPPO 算法在软体机械臂抓取不规则物体任务中达到了 86.3%的成功率，相比 TRPO 算法高出了约 3.6%，具有更优的性能。这说明 CPPO 算法可以应用于软体机械臂控制，可在非结构化环境为软体机械臂在复杂抓取任务中的应用提供重要参考。

关键词: 深度强化学习（DRL）, 近端策略优化算法（PPO）, 不规则物体检测, 软体机械臂, 机械臂抓取

Abstract: In order to cope with the performance limitations of traditional deep reinforcement learning algorithms in processing complex scenes, especially the performance limitations in irregular object grabbing and soft manipulator applications, poor algorithm stability and learning rate issues, a soft robotic arm control based on the Cropped Proximal Policy Optimization (CPPO) algorithm was proposed. By introducing a clipping function, this algorithm optimized the performance of the proximal policy optimization (PPO) algorithm and improved its stability and learning efficiency in high-dimensional state space. First, the state space and action space of the soft robotic arm were defined, and a soft robotic arm model imitating the tentacles of the octopus was designed. Secondly, Matlab's SoRoSim toolbox was used for modeling, and an environmental reward function that combines continuous and sparse functions was defined. Finally, a simulation platform based on Matlab was constructed. The irregular object images were preprocessed through Python scripts and filters, and Redis cache was used to efficiently transmit the processed contour data to the simulation platform. Comparative experiments with the TRPO (Trust Region Policy Optimization) algorithm and the SAC (Soft Actor-Critic) algorithm shew that the CPPO algorithm achieved a success rate of 86.3% in the task of grabbing irregular objects by the soft manipulator, which was higher than the TRPO algorithm. increased by about 3.6%, with better performance. It is shown that the CPPO algorithm can be applied to the control of soft manipulators, and can provide an important reference for the application of soft manipulators in complex grasping tasks in unstructured environments.

Key words: Deep Reinforcement Learning (DRL), Proximal Policy Optimization (PPO), Irregular object detection, Soft robotic arm, Robotic arm grasping

中图分类号:

TP181

余家宸杨晔. 基于裁剪近端策略优化算法的软机械臂不规则物体抓取[J]. 计算机应用, DOI: 10.11772/j.issn.1001-9081.2023111712.

[1]	宋逸飞柳毅. 基于数据增强和标签噪声的快速对抗训练方法[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[2]	黄雨鑫黄贻望黄辉. 基于浅层网络预测的元标签校正方法[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[3]	刘晶鑫黄雯静徐亮胜黄冲吴建生. 字典学习与样本关联保持结合的无监督特征选择模型[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[4]	姜世攀, 陈树伟, 曾国艳. 一阶逻辑定理证明器中的无效子句删除策略[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 677-682.
[5]	孟圣洁, 于万钧, 陈颖. 最大相关和最大差异的高维数据特征选择算法[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 767-771.
[6]	孙林, 刘梦含. 基于自适应布谷鸟优化特征选择的K-means聚类[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 831-841.
[7]	颜梦玫, 杨冬平. 深度神经网络平均场理论综述[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 331-343.
[8]	李力铤华蓓贺若舟徐况. 基于解耦注意力机制的多变量时序预测模型[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[9]	范黎林曹富康王琬婷杨凯宋钊瑜. 基于需求模式自适应匹配的间歇性需求预测方法[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[10]	朱俊宏, 赖俊宇, 甘炼强, 陈智勇, 刘华烁, 徐国尧. 结合内卷与卷积算子的视频预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 113-122.
[11]	陈彤, 位纪伟, 何仕远, 宋井宽, 杨阳. 基于自适应攻击强度的对抗训练方法[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 94-100.
[12]	周辉, 陈玉玲, 王学伟, 张洋文, 何建江. 基于生成对抗网络的联邦学习深度影子防御方案[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 223-232.
[13]	龙杰, 谢良, 徐海蛟. 集成的深度强化学习投资组合模型[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 300-310.
[14]	陈艳霞, 李鑫明, 王志勇, 于希娟, 闻宇, 夏时洪. 基于LSTM-CNN-Attention模型的电力设施非周期巡视决策方法[J]. 《计算机应用》唯一官方网站, 2023, 43(S2): 291-297.
[15]	郑佳炜, 唐厂. 自适应样本和特征加权的k-means算法[J]. 《计算机应用》唯一官方网站, 2023, 43(S2): 99-104.

基于裁剪近端策略优化算法的软机械臂不规则物体抓取

Irregular object grasping by soft robotic arm based on Clipped Proximal Policy Optimization algorithm

PDF

PDF (Mobile)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics