Journal of Computer Applications
Next Articles
Received:
Revised:
Online:
Published:
梁艳阳1,谢文轩2,崔伟2,吕洪妃2,李达2,钟东洲3
通讯作者:
基金资助:
Abstract: To address the low efficiency and the difficulty in balancing a high success rate with motion smoothness in end-to-end dynamic grasping tasks for robots, a method based on Curriculum Reinforcement Learning (CRL) was proposed. First, a multi-modal input network that fuses color images, depth maps, and robot proprioceptive states was constructed to directly map raw sensory data to continuous action commands for the end-effector. Then, a curriculum mechanism with synchronously increasing difficulty and smoothness constraints was designed. Combined with a staged reward function, this mechanism guides the agent to progressively master grasping from static to dynamic objects. Finally, Domain Randomization (DR) was employed to enhance the policy's transfer capability from simulation to reality (Sim-to-Real). The simulation results show that the proposed method achieves a grasping success rate of nearly 100% at target speeds ranging from 0.15 to 0.40 m/s, elevating the upper speed limit for stable grasping from 0.25 m/s of the object detection-based baseline method to 0.40 m/s. Compared to Simple Curriculum Learning (SimpleCL) with only increasing difficulty, the proposed method increases the success rate by 3.6 percentage points in the most difficult test, and reduces the average joint acceleration and jerk norm by 58.35% and 69.26%, respectively. In physical experiments, the grasping success rates for static scenes and two dynamic scenes are 95.0%, 90.0%, and 70.0%, respectively. The method achieves an effective balance between success rate and smoothness in robotic dynamic grasping tasks by collaboratively optimizing task difficulty and behavioral constraints.
Key words: dynamic grasping, reinforcement learning, curriculum learning, end-to-end learning, robotic manipulation
摘要: 针对机器人学习端到端动态抓取任务时效率低且难以兼顾高成功率与运动平滑性的问题,提出一种基于课程强化学习(CRL)的机器人端到端动态抓取方法。首先,该方法构建了融合彩色图、深度图与机器人本体状态的多模态输入网络,将原始感知直接映射至末端执行器的连续动作指令;其次,设计了难度与平滑性约束同步递增的课程机制,并结合阶段化的组合奖励,引导智能体从静态抓取逐步掌握至动态抓取;最后,采用域随机化(DR)技术增强策略从仿真到现实(Sim-to-Real)的迁移能力。仿真实验表明,在0.15~0.40 m/s的目标速度下,所提方法的抓取成功率接近100%,将稳定抓取的速度上限从基于目标检测的基线方法的0.25 m/s提升至0.40 m/s。相较于仅有难度递增的简单课程学习(SimpleCL),所提方法在最高难度测试中的抓取成功率提升了3.6个百分点,且关节平均加速度与加加速度范数分别降低了58.35%和69.26%。物理实验中,对静态场景及两种动态场景的抓取成功率分别为95.0%、90.0%和70.0%。该方法通过协同优化任务难度与行为约束,在机器人动态抓取任务中实现了成功率与平滑性的有效平衡。
关键词: 动态抓取, 强化学习, 课程学习, 端到端学习, 机器人操作
CLC Number:
TP242.6
梁艳阳 谢文轩 崔伟 吕洪妃 李达 钟东洲. 基于课程强化学习的机器人端到端动态抓取方法[J]. 《计算机应用》唯一官方网站, DOI: 10.11772/j.issn.1001-9081.2025060749.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2025060749