系统仿真学报 ›› 2025, Vol. 37 ›› Issue (6): 1462-1473.doi: 10.16182/j.issn1004731x.joss.24-0122

• 论文 • 上一篇    

基于改进PPO算法的机械臂动态路径规划

万宇航1, 朱子璐1, 钟春富1, 刘永奎1, 林廷宇2, 张霖3   

  1. 1.西安电子科技大学 机电工程学院,陕西 西安 710071
    2.北京市复杂产品先进制造系统工程技术研究中心 北京仿真中心,北京 100854
    3.北京航空航天大学 自动化科学与电气工程学院,北京 100191
  • 收稿日期:2024-01-31 修回日期:2024-04-08 出版日期:2025-06-20 发布日期:2025-06-18
  • 通讯作者: 刘永奎
  • 第一作者简介:万宇航(1998-),男,硕士生,研究方向为机器人学习和路径规划。
  • 基金资助:
    国家自然科学基金(61973243);中央高校基本科研业务费专项资金–西安电子科技大学研究生创新基金(YJSJ24001)

Dynamic Path Planning for Robotic Arms Based on an Improved PPO Algorithm

Wan Yuhang1, Zhu Zilu1, Zhong Chunfu1, Liu Yongkui1, Lin Tingyu2, Zhang Lin3   

  1. 1.School of Mechano-Electronic Engineering, Xidian University, Xi'an 710071, China
    2.Beijing Complex Product Advanced Manufacturing Engineering Research Center, Beijing Simulation Center, Beijing 100854, China
    3.School of Automation Science and Electrical Engineering, Beihang University, Beijing 100191, China
  • Received:2024-01-31 Revised:2024-04-08 Online:2025-06-20 Published:2025-06-18
  • Contact: Liu Yongkui

摘要:

针对非结构化环境下机械臂路径规划面临的环境不确定性因素增多、建模难度大等问题,提出了一种基于改进近端策略优化(PPO)算法的机械臂动态路径规划方法。针对由于动态环境中障碍物数量变化而导致的状态空间输入长度不固定的问题,提出了基于LSTM网络的环境状态输入处理方法,并对PPO算法的网络结构进行了改进;基于人工势场法设计了奖励函数,并建立机械臂碰撞检测模型。实验结果表明:改进算法能够适应场景中障碍物数量和位置的变化,具有更快的收敛速度和稳定性。

关键词: 动态路径规划, 改进PPO算法, LSTM网络, 人工势场法, ML-Agents

Abstract:

Aiming at the increased environmental uncertainties and more difficult modeling for robotic arm path planning in unstructured environments, an approach to dynamic path planning of robotic arms based on an improved PPO algorithm is proposed. In order to solve the problem that the input length of the state space is not fixed due to the change of number of obstacles in dynamic environment, an environmental state input processing method based on the LSTM network is proposed, and the network structure of PPO algorithm is also improved; a reward function is designed based on the artificial potential field method, and a collision detection model of robotic arms is established. Experimental results indicate that the improved PPO algorithm can adapt to the change of the number and position of obstacles in the scene with a faster convergence speed and stability than traditional PPO algorithms.

Key words: dynamic path planning, improved PPO, LSTM network, APF, ML-agents

中图分类号: