Curriculum Reinforcement Learning-Based Robotic End-to-End Dynamic Grasping

doi:10.11772/j.issn.1001-9081.2025060749

Abstract

Abstract: To address the low efficiency and the difficulty in balancing a high success rate with motion smoothness in end-to-end dynamic grasping tasks for robots, a method based on Curriculum Reinforcement Learning (CRL) was proposed. First, a multi-modal input network that fuses color images, depth maps, and robot proprioceptive states was constructed to directly map raw sensory data to continuous action commands for the end-effector. Then, a curriculum mechanism with synchronously increasing difficulty and smoothness constraints was designed. Combined with a staged reward function, this mechanism guides the agent to progressively master grasping from static to dynamic objects. Finally, Domain Randomization (DR) was employed to enhance the policy's transfer capability from simulation to reality (Sim-to-Real). The simulation results show that the proposed method achieves a grasping success rate of nearly 100% at target speeds ranging from 0.15 to 0.40 m/s, elevating the upper speed limit for stable grasping from 0.25 m/s of the object detection-based baseline method to 0.40 m/s. Compared to Simple Curriculum Learning (SimpleCL) with only increasing difficulty, the proposed method increases the success rate by 3.6 percentage points in the most difficult test, and reduces the average joint acceleration and jerk norm by 58.35% and 69.26%, respectively. In physical experiments, the grasping success rates for static scenes and two dynamic scenes are 95.0%, 90.0%, and 70.0%, respectively. The method achieves an effective balance between success rate and smoothness in robotic dynamic grasping tasks by collaboratively optimizing task difficulty and behavioral constraints.

Key words: dynamic grasping, reinforcement learning, curriculum learning, end-to-end learning, robotic manipulation

摘要： 针对机器人学习端到端动态抓取任务时效率低且难以兼顾高成功率与运动平滑性的问题，提出一种基于课程强化学习(CRL)的机器人端到端动态抓取方法。首先，该方法构建了融合彩色图、深度图与机器人本体状态的多模态输入网络，将原始感知直接映射至末端执行器的连续动作指令；其次，设计了难度与平滑性约束同步递增的课程机制，并结合阶段化的组合奖励，引导智能体从静态抓取逐步掌握至动态抓取；最后，采用域随机化(DR)技术增强策略从仿真到现实(Sim-to-Real)的迁移能力。仿真实验表明，在0.15~0.40 m/s的目标速度下，所提方法的抓取成功率接近100%，将稳定抓取的速度上限从基于目标检测的基线方法的0.25 m/s提升至0.40 m/s。相较于仅有难度递增的简单课程学习(SimpleCL)，所提方法在最高难度测试中的抓取成功率提升了3.6个百分点，且关节平均加速度与加加速度范数分别降低了58.35%和69.26%。物理实验中，对静态场景及两种动态场景的抓取成功率分别为95.0%、90.0%和70.0%。该方法通过协同优化任务难度与行为约束，在机器人动态抓取任务中实现了成功率与平滑性的有效平衡。

关键词: 动态抓取, 强化学习, 课程学习, 端到端学习, 机器人操作

CLC Number:

TP242.6

梁艳阳谢文轩崔伟吕洪妃李达钟东洲. 基于课程强化学习的机器人端到端动态抓取方法[J]. 《计算机应用》唯一官方网站, DOI: 10.11772/j.issn.1001-9081.2025060749.

[1]	Tianyu XUE, Aiping LI, Liguo DUAN. Vehicular edge computing scheme with task offloading and resource optimization [J]. Journal of Computer Applications, 2025, 45(6): 1766-1775.
[2]	Pengcheng XU, Lei HE, Chuan LI, Weiqi QIAN, Tun ZHAO. Deep symbolic regression method based on Transformer [J]. Journal of Computer Applications, 2025, 45(5): 1455-1463.
[3]	Jing WANG, Xuming FANG. Intelligent joint power and channel allocation algorithm for Wi-Fi7 multi-link integrated communication and sensing [J]. Journal of Computer Applications, 2025, 45(2): 563-570.
[4]	Huahua WANG, Liang HUANG, Jiajie CHEN, Jiening FANG. Dynamic allocation algorithm for multi-beam subcarriers of low orbit satellites based on deep reinforcement learning [J]. Journal of Computer Applications, 2025, 45(2): 571-577.
[5]	Lin WEI, Shihao ZHANG, Mengyang HE. Workflow task optimization and energy-efficient offloading method for computing power network [J]. Journal of Computer Applications, 2025, 45(12): 3916-3924.
[6]	Chengyi WANG, Lei XU, Jinyin CHEN, Hongjun QIU. Cyber anti-mapping method based on adaptive perturbation [J]. Journal of Computer Applications, 2025, 45(12): 3896-3908.
[7]	Xiaojuan CHEN, Wei ZHANG. Task allocation of unmanned aerial vehicle for rural last-mile delivery based on reinforcement learning [J]. Journal of Computer Applications, 2025, 45(12): 4055-4063.
[8]	Jun ZENG, Yinghua TONG, Defang WANG. Anomaly detection method based on cumulative probability fluctuation and automated clustering [J]. Journal of Computer Applications, 2025, 45(12): 3864-3871.
[9]	Haoxiang XU, Dunhui YU, Yichen DENG, Kui XIAO. Knowledge graph constrained question answering model based on hierarchical reinforcement learning [J]. Journal of Computer Applications, 2025, 45(12): 3764-3770.
[10]	Xiang KUANG, Zhen MA, Wanchun ZHU, Zhi ZHANG, Yunfei CUI. Secure and reliable service function chain deployment based on encoder-decoder structured reinforcement learning [J]. Journal of Computer Applications, 2025, 45(12): 3947-3956.
[11]	Jinghua ZHAO, Zhu ZHANG, Xiting LYU, Huidan LIN. Multiscale information diffusion prediction model based on hypergraph neural network [J]. Journal of Computer Applications, 2025, 45(11): 3529-3539.
[12]	Shuai ZHOU, Hao FU, Wei LIU. Spatial-temporal Transformer-based hybrid return implicit Q-learning for crowd navigation [J]. Journal of Computer Applications, 2025, 45(11): 3666-3673.
[13]	Yanpeng ZHANG, Yuqian ZHAO, Fan ZHANG, Tenghai QIU, Gui GUI, Lingli YU. Capacitated vehicle routing problem solving method based on improved MAML and GVAE [J]. Journal of Computer Applications, 2025, 45(11): 3642-3648.
[14]	Lin WEI, Jinyang LI, Yajie WANG, Mengyang HE. Highly reliable matching method based on multi-dimensional resource measurement and rescheduling in computing power network [J]. Journal of Computer Applications, 2025, 45(11): 3632-3641.
[15]	Yu WANG, Mingyue ZHAO, Xiaolin ZHOU. Task-based assistive robot path planning in nursing home scenarios [J]. Journal of Computer Applications, 2025, 45(10): 3270-3276.

Curriculum Reinforcement Learning-Based Robotic End-to-End Dynamic Grasping

基于课程强化学习的机器人端到端动态抓取方法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics