系统仿真学报 ›› 2026, Vol. 38 ›› Issue (2): 321-331.doi: 10.16182/j.issn1004731x.joss.25-0768

• 机器学习算法 • 上一篇    

基于Transformer课程RL的机械臂接球策略仿真研究

章子瑶1, 季云峰2   

  1. 1.上海理工大学 健康科学与工程学院,上海 200093
    2.上海理工大学 机器智能研究院,上海 200093
  • 收稿日期:2025-08-12 修回日期:2025-09-22 出版日期:2026-02-18 发布日期:2026-02-11
  • 通讯作者: 季云峰
  • 第一作者简介:章子瑶(2001-),女,硕士生,研究方向为强化学习、数字孪生。
  • 基金资助:
    国家自然科学基金(62403319)

Simulation of Robotic Arm Ball-catching Strategy Based on Curriculum RL of Transformer

Zhang Ziyao1, Ji Yunfeng2   

  1. 1.School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
    2.Institute of Machine Intelligence, University of Shanghai for Science and Technology, Shanghai 200093, China
  • Received:2025-08-12 Revised:2025-09-22 Online:2026-02-18 Published:2026-02-11
  • Contact: Ji Yunfeng

摘要:

针对机械臂接球等高自由度复杂动态任务中传统RL方法训练收敛难、效率低的问题,提出一种融合PPO算法与Transformer网络架构,并引入课程学习策略。利用Transformer有效捕捉机械臂状态空间、球体运动轨迹和环境物理参数间的高维复杂依赖关系;课程学习从简到难设计训练任务目标,逐步提升捕捉难度。实验结果表明:同等条件下比传统PPO接球成功率提升60%以上,对真实扰动特征的小球轨迹捕捉精度优异,不仅提升了在模拟和现实扰动条件下机械臂动态捕捉的性能与效率,也为真实场景复杂任务控制提供新途径。

关键词: 强化学习, 课程学习, Transformer, 机械臂, 接球控制

Abstract:

method integrating the PPO algorithm with Transformer network architecture is proposed, and curriculum learning strategy is introduced to solve the difficult training convergence and low efficiency of traditional RL methods in complex and dynamic high-degree-of-freedom tasks such as robotic arm ball catching. The Transformer is employed to effectively capture the complex high-dimensional dependency between the robotic arm's state space, ball trajectory, and environmental physical parameters. Curriculum learning progressively increases catching difficulty by designing training tasks from simple to complex objectives. The experimental results show this method increases the ball-catching success rate by over 60% compared to the traditional PPO and features excellent accuracy at tracking balls with real-world disturbance characteristics. This method not only enhances the performance and efficiency of dynamic catching for robotic arms in both simulated and real-world disturbance conditions, but also provides a novel solution for complex task control in real-world scenarios.

Key words: RL, curriculum learning, Transformer, robotic arm, ball-catching control

中图分类号: