基于深度强化学习的改进RRT算法路径规划

doi:10.16182/j.issn1004731x.joss.24-0494

摘要/Abstract

摘要：

针对RRT算法在三维复杂场景中规划全局路径时存在规划效率低、安全性和实用性较差而无法满足无人机对飞行路径的安全需求，提出SAC深度强化学习算法与RRT算法融合的SAC-RRT算法。设计基于SAC算法决策网络的目标点偏置策略和动态步长策略，降低RRT盲目性；设计随机点修正过程，根据决策网络输出动作优化随机点位置，改善路径安全性；设计精简步骤和平滑步骤，进一步提高路径安全性。设计了不同复杂程度的三维场景，规划结果表明：SAC-RRT算法有效缩短了路径长度和规划时间，改善了路径的平滑性和安全性。

关键词: 深度强化学习, SAC算法, RRT算法, 无人机, 三次B样条

Abstract:

To address the low planning efficiency, poor safety, and limited practicability of the RRT algorithm in global path planning within complex three-dimensional environments, which fail to meet the requirements of planning the safe flight path of UAVs, an improved SAC-RRT algorithm was proposed, which fused SAC deep reinforcement learning algorithm and RRT algorithm. A target point bias strategy and a dynamic step size based on the SAC decision-making network were designed to reduce the blindness of RRT. A random point correction process was designed to optimize the position of random points based on actions from the decision network and improve the path safety. In addition, simplified and smooth steps were designed to further improve path safety. Several 3D scenarios of varying complexity were designed, and the planning results show that the SAC-RRT algorithm reduces path length and planning time while improving path smoothness and safety.

Key words: deep reinforcement learning, SAC algorithm, RRT algorithm, UAV, cubic B-spline

中图分类号:

TP18

梁秀满,刘子良,刘振东 . 基于深度强化学习的改进RRT算法路径规划[J]. 系统仿真学报, 2025, 37(10): 2578-2593.

Liang Xiuman,Liu Ziliang,Liu Zhendong . Path Planning of Improved RRT Algorithm Based on Deep Reinforcement Learning[J]. Journal of System Simulation, 2025, 37(10): 2578-2593.

图/表 27

图1

图2

图3

图4

图5

图6

图7

图8

图9

图10

表1

环境参数设定

$L m a x$	$L m i n$	$L s o f t$	$L w$	$θ m a x$ /(°)	$θ s t a r$ /(°)
300	1	5	2	65	45

表1

表2

SAC奖励参数

$P s$	$P f$	$P L$	$P m a x θ$	$P g o a l$	$P m a x o$
10	-20	5	-0.1	-0.1	-0.1

表2

表3

图11

图12

表4

图13

图14

图15

表5

图16

表6

图17

图18

表7

图19

图20

参考文献 21

[1]	陈锦涛, 李鸿一, 任鸿儒, 等. 基于RRT森林算法的高层消防多无人机室内协同路径规划[J]. 自动化学报, 2023, 49(12): 2615-2626.
	Chen Jintao, Li Hongyi, Ren Hongru, et al. Cooperative Indoor Path Planning of Multi-UAVs for High-rise Fire Fighting Based on RRT-forest Algorithm[J]. Acta Automatica Sinica, 2023, 49(12): 2615-2626.
[2]	Sun Yinghui, Fang Ming, Su Yixin. AGV Path Planning Based on Improved Dijkstra Algorithm[J]. Journal of Physics: Conference Series, 2021, 1746(1): 012052.
[3]	Zhang Jing, Wu Jun, Shen Xiao, et al. Autonomous Land Vehicle Path Planning Algorithm Based on Improved Heuristic Function of A-star[J]. International Journal of Advanced Robotic Systems, 2021, 2021(9): 17298814211042730.
[4]	李琼琼, 徐溢琪, 布升强, 等. 基于修正PRM算法的智能车辆路径规划研究[J]. 森林工程, 2022, 38(5): 179-186.
	Li Qiongqiong, Xu Yiqi, Bu Shengqiang, et al. Smart Vehicle Path Planning Based on Modified PRM Algorithm[J]. Forest Engineering, 2022, 38(5): 179-186.
[5]	Katoch Sourabh, Sumit Singh Chauhan, Kumar Vijay. A Review on Genetic Algorithm: Past, Present, and Future[J]. Multimedia Tools and Applications, 2021, 80(5): 8091-8126.
[6]	于力涵, 洪儒, 吴宇伦, 等. 基于IKGC-PSO算法的无人机三维路径规划系统[J]. 计算机测量与控制, 2023, 31(8): 259-266.
	Yu Lihan, Hong Ru, Wu Yulun, et al. UAV 3D Path Planning System Based on IKGC-PSO Algorithm[J]. Computer Measurement & Control, 2023, 31(8): 259-266.
[7]	Yuan Qingni, Yi Junhui, Sun Ruitong, et al. Path Planning of a Mechanical Arm Based on an Improved Artificial Potential Field and a Rapid Expansion Random Tree Hybrid Algorithm[J]. Algorithms, 2021, 14(11): 321.
[8]	黄岩松, 姚锡凡, 景轩, 等. 基于深度Q网络的多起点多终点AGV路径规划[J]. 计算机集成制造系统, 2023, 29(8): 2550-2562.
	Huang Yansong, Yao Xifan, Jing Xuan, et al. DQN-based AGV Path Planning for Situations with Multi-starts and Multi-targets[J]. Computer Integrated Manufacturing Systems, 2023, 29(8): 2550-2562.
[9]	周治国, 余思雨, 于家宝, 等. 面向无人艇的T-DQN智能避障算法研究[J]. 自动化学报, 2023, 49(8): 1645-1655.
	Zhou Zhiguo, Yu Siyu, Yu Jiabao, et al. Research on T-DQN Intelligent Obstacle Avoidance Algorithm of Unmanned Surface Vehicle[J]. Acta Automatica Sinica, 2023, 49(8): 1645-1655.
[10]	Karaman S, Walter M R, Perez Alejandro, et al. Anytime Motion Planning Using the RRT*[C]//2011 IEEE International Conference on Robotics and Automation. Piscataway: IEEE, 2011: 1478-1483.
[11]	Kuffner J J, LaValle S M. RRT-connect: An Efficient Approach to Single-query Path Planning[C]//Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings. Piscataway: IEEE, 2000: 995-1001.
[12]	Karaman S, Frazzoli E. Sampling-based Algorithms for Optimal Motion Planning[J]. International Journal of Robotics Research, 2011, 30(7): 846-894.
[13]	Klemm Sebastian, Oberländer Jan, Hermann Andreas, et al. RRT*-connect: Faster, Asymptotically Optimal Motion Planning[C]//2015 IEEE International Conference on Robotics and Biomimetics (ROBIO). Piscataway: IEEE, 2015: 1670-1677.
[14]	王冠强, 张驰洲, 陈明松, 等. 融合RRT-connect和DWA算法的室内移动机器人单目标点导航任务研究[J]. 中南大学学报(自然科学版), 2023, 54(11): 4326-4337.
	Wang Guanqiang, Zhang Chizhou, Chen Mingsong, et al. Research on Single-target Point Navigation Task of Indoor Mobile Robot Integrating RRT-connect and DWA Algorithms[J]. Journal of Central South University(Science and Technology), 2023, 54(11): 4326-4337.
[15]	Chiang H T L, Hsu J, Fiser M, et al. RL-RRT: Kinodynamic Motion Planning via Learning Reachability Estimators from RL Policies[J]. IEEE Robotics and Automation Letters, 2019, 4(4): 4298-4305.
[16]	Haarnoja T, Zhou A, Abbeel P, et al. Soft Actor-critic: Off-policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor[C]//Proceedings of the 35th International Conference on Machine Learning. Chia Laguna Resort: PMLR, 2018: 1861-1870.
[17]	Kurniawati H. Partially Observable Markov Decision Processes and Robotics[J]. Annual Review of Control, Robotics, and Autonomous Systems, 2022, 5: 253-277.
[18]	Konda V R, Tsitsiklis J N. Actor-citic Agorithms[C]//Proceedings of the 13th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 1999: 1008-1014.
[19]	杨来义, 毕敬, 苑海涛. 基于SAC算法的移动机器人智能路径规划[J]. 系统仿真学报, 2023, 35(8): 1726-1736.
	Yang Laiyi, Bi Jing, Yuan Haitao. Intelligent Path Planning for Mobile Robots Based on SAC Algorithm[J]. Journal of System Simulation, 2023, 35(8): 1726-1736.
[20]	罗征志, 韩怡可, 张鑫, 等. 改进RRT-connect与DWA算法的巡检机器人路径规划研究[J]. 计算机工程与应用, 2024, 60(15): 344-354.
	Luo Zhengzhi, Han Yike, Zhang Xin, et al. Research on Path Planning of Inspection Robot with Improved RRT-connect and DWA Algorithm[J]. Computer Engineering and Applications, 2024, 60(15): 344-354.
[21]	Prautzsch H, Boehm W, Paluszny M. Bézier and B-spline Techniques[M]. Berlin: Springer Science & Business Media, 2002.

参数名称	数值
演员网络学习率	0.003
评论家网络学习率	0.003
软更新参数	0.005
温度参数	0.8
折扣系数	0.97
每回合采样数量	128
训练最大回合数	3 000
经验池容量	100 000

算法	时间/s	长度	节点数	采样数
RRT	7.41	243.61	25	136
SAC-RRT	3.24	229.54	12	37

算法	时间/s	长度	节点数	采样数
RRT	7.81	237.01	25	136
RRT-Connect	4.96	262.25	33	46
RRT*-Connect	5.78	185.84	4	63
SAC-RRT	3.34	189.54	3	39

算法	时间/s	长度	节点数	采样数
RRT	10.63	256.69	45	234
RRT-Connect	5.26	276.96	32	69
RRT*-Connect	6.77	186.73	5	83
SAC-RRT	3.9	193.34	5	46