基于改进PPO算法的机械臂动态路径规划

doi:10.16182/j.issn1004731x.joss.24-0122

系统仿真学报 ›› 2025, Vol. 37 ›› Issue (6): 1462-1473.doi: 10.16182/j.issn1004731x.joss.24-0122

• 论文 • 上一篇

基于改进PPO算法的机械臂动态路径规划

万宇航¹, 朱子璐¹, 钟春富¹, 刘永奎¹, 林廷宇², 张霖³

^1.西安电子科技大学机电工程学院，陕西西安 710071
^2.北京市复杂产品先进制造系统工程技术研究中心北京仿真中心，北京 100854
^3.北京航空航天大学自动化科学与电气工程学院，北京 100191

收稿日期:2024-01-31 修回日期:2024-04-08 出版日期:2025-06-20 发布日期:2025-06-18
通讯作者: 刘永奎
第一作者简介:万宇航(1998-)，男，硕士生，研究方向为机器人学习和路径规划。
基金资助:
国家自然科学基金(61973243);中央高校基本科研业务费专项资金–西安电子科技大学研究生创新基金(YJSJ24001)

Dynamic Path Planning for Robotic Arms Based on an Improved PPO Algorithm

Wan Yuhang¹, Zhu Zilu¹, Zhong Chunfu¹, Liu Yongkui¹, Lin Tingyu², Zhang Lin³

^1.School of Mechano-Electronic Engineering, Xidian University, Xi'an 710071, China
^2.Beijing Complex Product Advanced Manufacturing Engineering Research Center, Beijing Simulation Center, Beijing 100854, China
^3.School of Automation Science and Electrical Engineering, Beihang University, Beijing 100191, China

Received:2024-01-31 Revised:2024-04-08 Online:2025-06-20 Published:2025-06-18
Contact: Liu Yongkui

摘要/Abstract

摘要：

针对非结构化环境下机械臂路径规划面临的环境不确定性因素增多、建模难度大等问题，提出了一种基于改进近端策略优化(PPO)算法的机械臂动态路径规划方法。针对由于动态环境中障碍物数量变化而导致的状态空间输入长度不固定的问题，提出了基于LSTM网络的环境状态输入处理方法，并对PPO算法的网络结构进行了改进；基于人工势场法设计了奖励函数，并建立机械臂碰撞检测模型。实验结果表明：改进算法能够适应场景中障碍物数量和位置的变化，具有更快的收敛速度和稳定性。

关键词: 动态路径规划, 改进PPO算法, LSTM网络, 人工势场法, ML-Agents

Abstract:

Aiming at the increased environmental uncertainties and more difficult modeling for robotic arm path planning in unstructured environments, an approach to dynamic path planning of robotic arms based on an improved PPO algorithm is proposed. In order to solve the problem that the input length of the state space is not fixed due to the change of number of obstacles in dynamic environment, an environmental state input processing method based on the LSTM network is proposed, and the network structure of PPO algorithm is also improved; a reward function is designed based on the artificial potential field method, and a collision detection model of robotic arms is established. Experimental results indicate that the improved PPO algorithm can adapt to the change of the number and position of obstacles in the scene with a faster convergence speed and stability than traditional PPO algorithms.

Key words: dynamic path planning, improved PPO, LSTM network, APF, ML-agents

中图分类号:

TP391.9

万宇航,朱子璐,钟春富等 . 基于改进PPO算法的机械臂动态路径规划[J]. 系统仿真学报, 2025, 37(6): 1462-1473.

Wan Yuhang,Zhu Zilu,Zhong Chunfu,et al . Dynamic Path Planning for Robotic Arms Based on an Improved PPO Algorithm[J]. Journal of System Simulation, 2025, 37(6): 1462-1473.

图/表 16

图1

图2

图3

图4

表1

模型超参数

参数名	参数含义	值
batch_size	迭代样本批量	5 120
buffer_size	记忆池大小	51 200
learning_rate	学习率	0.000 3
lambda	广义优势估计参数	0.95
time_horizon	加入记忆池前训练步数	1 000
gamma	折扣因子	0.995
$k 1$	奖励函数因子	0.1
$k 2$		0.15
$k 3$		0.1
$k 4$		0.05

表1

图5

图6

图7

图8

图9

图10

表2

图11

图12

图13

图14

参考文献 21

1	Wei Kun, Ren Bingyin. A Method on Dynamic Path Planning for Robotic Manipulator Autonomous Obstacle Avoidance Based on an Improved RRT Algorithm[J]. Sensors, 2018, 18(2): 571.
2	Xin Jing, Zhao Huan, Liu Ding, et al. Application of Deep Reinforcement Learning in Mobile Robot Path Planning[C]//2017 Chinese Automation Congress (CAC). Piscataway: IEEE, 2017: 7112-7116.
3	Li Yue, Zhao Jianyou, Chen Zenghua, et al. A Robot Path Planning Method Based on Improved Genetic Algorithm and Improved Dynamic Window Approach[J]. Sustainability, 2023, 15(5): 4656.
4	陈洋, 赵新刚, 韩建达. 移动机器人3维路径规划方法综述[J]. 机器人, 2010, 32(4): 568-576.
	Chen Yang, Zhao Xingang, Han Jianda. Review of 3D Path Planning Methods for Mobile Robot[J]. Robot, 2010, 32(4): 568-576.
5	Sundarraj Subaselvi, Vijaya Kumar Reddy R, Mahesh Babu Basam, et al. Route Planning for an Autonomous Robotic Vehicle Employing a Weight-controlled Particle Swarm-optimized Dijkstra Algorithm[J]. IEEE Access, 2023, 11: 92433-92442.
6	Li Weimin, Wang Lei, Zou Awei, et al. Path Planning for UAV Based on Improved PRM[J]. Energies, 2022, 15(19): 7267.
7	Xin Peng, Wang Xiaomin, Liu Xiaoli, et al. Improved Bidirectional RRT^* Algorithm for Robot Path Planning[J]. Sensors, 2023, 23(2): 1041.
8	Wang Qian, Li Junli, Yang Liwei, et al. Distributed Multi-mobile Robot Path Planning and Obstacle Avoidance Based on ACO-DWA in Unknown Complex Terrain[J]. Electronics, 2022, 11(14): 2144.
9	Zheng Li, Yu Wenjie, Li Guangxu, et al. Particle Swarm Algorithm Path-planning Method for Mobile Robots Based on Artificial Potential Fields[J]. Sensors, 2023, 23(13): 6082.
10	胡琴, 赵一亭, 夏方平, 等. 基于Soft-Actor-Critic算法的机器人局部路径规划算法[J]. 武汉理工大学学报, 2021, 43(9): 79-84.
	Hu Qin, Zhao Yiting, Xia Fangping, et al. Robot Local Path Planning Algorithm Based on Soft-actor-critic Algorithm[J]. Journal of Wuhan University of Technology, 2021, 43(9): 79-84.
11	张柏鑫, 杨毅镔, 朱华中, 等. 基于深度强化学习的移动机器人动态路径规划算法[J]. 计算机测量与控制, 2023, 31(1): 153-159, 166.
	Zhang Baixin, Yang Yibin, Zhu Huazhong, et al. Dynamic Path Planning Algorithm of Mobile Robot Based on Deep Reinforcement Learning[J]. Computer Measurement & Control, 2023, 31(1): 153-159, 166.
12	杨来义, 毕敬, 苑海涛. 基于SAC算法的移动机器人智能路径规划[J]. 系统仿真学报, 2023, 35(8): 1726-1736.
	Yang Laiyi, Bi Jing, Yuan Haitao. Intelligent Path Planning for Mobile Robots Based on SAC Algorithm[J]. Journal of System Simulation, 2023, 35(8): 1726-1736.
13	Zhan Shi, Zhang Tingting, Lei Han, et al. Research on Path Planning of Mobile Robot Based on Deep Reinforcement Learning[C]//Big Data and Security. Singapore: Springer Singapore, 2021: 549-560.
14	Yang Yang, Li Juntao, Peng Lingling. Multi-robot Path Planning Based on a Deep Reinforcement Learning DQN Algorithm[J]. CAAI Transactions on Intelligence Technology, 2020, 5(3): 177-183.
15	Gao Junli, Ye Weijie, Guo Jing, et al. Deep Reinforcement Learning for Indoor Mobile Robot Path Planning[J]. Sensors, 2020, 20(19): 5493.
16	闫皎洁, 张锲石, 胡希平. 基于强化学习的路径规划技术综述[J]. 计算机工程, 2021, 47(10): 16-25.
	Yan Jiaojie, Zhang Qieshi, Hu Xiping. Review of Path Planning Techniques Based on Reinforcement Learning[J]. Computer Engineering, 2021, 47(10): 16-25.
17	Schulman J, Wolski F, Dhariwal P, et al. Proximal Policy Optimization Algorithms[EB/OL]. (2017-08-28) [2024-01-18]. .
18	Hochreiter Sepp, Schmidhuber Jürgen. Long Short-term Memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
19	Gers Felix A, Schraudolph Nicol N, Schmidhuber Jürgen. Learning Precise Timing with LSTM Recurrent Networks[J]. The Journal of Machine Learning Research, 2002, 3: 115-143.
20	刘国名, 李彩虹, 李永迪, 等. 基于改进PPO算法的机器人局部路径规划[J]. 计算机工程, 2023, 49(2): 119-126, 135.
	Liu Guoming, Li Caihong, Li Yongdi, et al. Local Path Planning of Robot Based on Improved PPO Algorithm[J]. Computer Engineering, 2023, 49(2): 119-126, 135.
21	林俊强, 王红军, 邹湘军, 等. 基于DPPO的移动采摘机器人避障路径规划及仿真[J]. 系统仿真学报, 2023, 35(8): 1692-1704.
	Lin Junqiang, Wang Hongjun, Zou Xiangjun, et al. Obstacle Avoidance Path Planning and Simulation of Mobile Picking Robot Based on DPPO[J]. Journal of System Simulation, 2023, 35(8): 1692-1704.

算法名称	平均时间/s	成功率/%
传统PPO	4.51	72.75
改进PPO	2.29	99.59

[1]	张森, 代强强. 改进型深度确定性策略梯度的无人机路径规划[J]. 系统仿真学报, 2025, 37(4): 875-881.
[2]	刘福琳, 李庆鑫. 多移动机器人混合避障算法的编队策略[J]. 系统仿真学报, 2024, 36(3): 726-734.
[3]	郭明皓, 姬鹏, 黄海威. 基于改进人工势场法的无人车路径规划与跟踪控制[J]. 系统仿真学报, 2024, 36(10): 2423-2434.
[4]	李颀, 汪伟. 多全向轮协同分拣平台的路径规划[J]. 系统仿真学报, 2021, 33(3): 698-709.
[5]	谌海云, 陈华胄, 刘强. 基于改进人工势场法的多无人机三维编队路径规划[J]. 系统仿真学报, 2020, 32(3): 414-420.
[6]	孟小凡, 宋华. 基于神经网络的卫星姿控系统故障预测[J]. 系统仿真学报, 2019, 31(11): 2499-2508.

基于改进PPO算法的机械臂动态路径规划

Dynamic Path Planning for Robotic Arms Based on an Improved PPO Algorithm

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 16

参考文献 21

相关文章 6

编辑推荐

Metrics

本文评价