基于TD3-RRT的特殊环境下USV路径规划算法研究

doi:10.16182/j.issn1004731x.joss.24-0622

摘要/Abstract

摘要：

面对多障碍、大尺寸障碍、狭窄通道等特殊环境下的USV路径规划问题，快速扩展随机树算法(rapidly-exploring random trees，RRT)存在采样基数大、规划成功率低、规划路径曲折等缺点。基于双延迟深度确定性策略梯度(twin delayed deep deterministic policy gradient，TD3)提出一种全局路径规划算法(TD3-RRT)。结合RRT算法与深度强化学习建立USV路径搜索模型，利用前视探测感知环境以自适应调整扩展步长，通过策略网络输出路径搜索方向，解决RRT算法扩展盲目的问题；改进后见经验回放策略，通过重选虚拟目标、双经验回放池采样等策略以增强复杂环境下路径搜索能力；通过奖励函数提高规划路径质量，加快路径搜索速度。实验结果表明：不同环境下TD3-RRT相比当前主流算法能够有效提高规划成功率，优化转向角度、路径长度和规划时间，证明了改进算法能有效加快路径搜索速度并提高路径质量，且对不同环境具有良好适应性。

关键词: 双延迟深度确定性策略梯度算法, 路径规划, 特殊环境, 快速扩展随机树算法, USV, 后见经验回放

Abstract:

In view of USV path planning in special environments such as multiple obstacles, large-size obstacles, and narrow passages, the rapidly-exploring random tree (RRT) algorithm suffers from drawbacks such as a large sampling base, low success rate, and zigzagging planned path. To address these problems, a global path planning algorithm (TD3-RRT) was proposed based on the twin delayed deep deterministic policy gradient (TD3). The USV path search model was established by combining the RRT algorithm with deep reinforcement learning. Forward looking detection was used to sense the environment to adaptively adjust the step size. The path search direction was exported through the policy network to solve the problem of blind expansion in the RRT algorithm. An improved hindsight experience replay strategy was proposed, which enhanced the path search capability in complex environments by re-selecting the virtual targets and sampling in double experience replay pools. A reward function was designed to improve the quality of the planned path and accelerate the path searching speed. Experimental results show that under different environments, compared with current mainstream algorithms, TD3-RRT can effectively improve the path planning success rate and optimize the redundant steering angle, path length, and path planning time. which proves that the improved algorithms can effectively speed up the path search speed and improve the quality of paths. Furthermore, it has a good adaptability to different environments.

Key words: TD3 algorithm, path planning, special environment, RRT algorithm, USV, hindsight experience replay

中图分类号:

TP391.9

陈际同,周佳加,吴迪等 . 基于TD3-RRT的特殊环境下USV路径规划算法研究[J]. 系统仿真学报, 2025, 37(11): 2888-2903.

Chen Jitong,Zhou Jiajia,Wu Di,et al . A USV Path Planning Algorithm under Special Environment Based on TD3-RRT[J]. Journal of System Simulation, 2025, 37(11): 2888-2903.

图/表 20

图1

图2

图3

图4

图5

图6

图7

表1

图8

表2

图9

图10

表3

表4

表5

表6

表7

图11

图12

图13

参考文献 20

[1]	Zhang Dengxing, Chen Chen, Zhang Guanyu. AGV Path Planning Based on Improved A-star Algorithm[C]//2024 IEEE 7th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC). Piscataway: IEEE, 2024: 1590-1595.
[2]	Fan Xiaojing, Guo Yinjing, Liu Hui, et al. Improved Artificial Potential Field Method Applied for AUV Path Planning[J]. Mathematical Problems in Engineering, 2020, 2020(1): 6523158.
[3]	Ubaidillah Achmad, Sukri Hanifudin. Application of Odometry and Dijkstra Algorithm as Navigation and Shortest Path Determination System of Warehouse Mobile Robot[J]. Journal of Robotics and Control, 2023, 4(3): 413-423.
[4]	Zhang Jingcheng, An Yuqiang, Cao Jianing, et al. UAV Trajectory Planning for Complex Open Storage Environments Based on an Improved RRT Algorithm[J]. IEEE Access, 2023, 11: 23189-23204.
[5]	Chen Zihong, Zhang Xing, Wang Liangyan, et al. A Fast Path Planning Method Based on RRT Star Algorithm[C]//2023 3rd International Conference on Consumer Electronics and Computer Engineering (ICCECE). Piscataway: IEEE, 2023: 258-262.
[6]	Tu Zhixin, Zhuang Wenbing, Leng Yuquan, et al. Accelerated Informed RRT: Fast and Asymptotically Path Planning Method Combined with RRT-connect and APF[C]//International Conference on Intelligent Robotics and Applications. Singapore: Springer Nature Singapore, 2023: 279-292.
[7]	Tu Haiyan, Deng Yizhao, Li Qiyang, et al. Improved RRT Global Path Planning Algorithm Based on Bridge Test[J]. Robotics and Autonomous Systems, 2024, 171: 104570.
[8]	Gu Qiyong, Zhen Rong, Liu Jialun, et al. An Improved RRT Algorithm Based on Prior AIS Information and DP Compression for Ship Path Planning[J]. Ocean Engineering, 2023, 279: 114595.
[9]	Huang Jie, Sun Wei. A Method of Feasible Trajectory Planning for UAV Formation Based on Bi-directional Fast Search Tree[J]. Optik, 2020, 221: 165213.
[10]	Nguyen T T, Nguyen N D, Nahavandi S. Deep Reinforcement Learning for Multiagent Systems: A Review of Challenges, Solutions, and Applications[J]. IEEE Transactions on Cybernetics, 2020, 50(9): 3826-3839.
[11]	Sun Qingqiang, Ge Zhiqiang. A Survey on Deep Learning for Data-driven Soft Sensors[J]. IEEE Transactions on Industrial Informatics, 2021, 17(9): 5853-5866.
[12]	Wang Ruihui, Xu Li. Application of Deep Reinforcement Learning in UAVs: A Review[C]//2022 34th Chinese Control and Decision Conference (CCDC). Piscataway: IEEE, 2022: 4096-4103.
[13]	Wang Haodong, Hao Jiangyu, Wu Wenhao, et al. A New AGV Path Planning Method Based on PPO Algorithm[C]//2023 42nd Chinese Control Conference (CCC). Piscataway: IEEE, 2023: 3760-3765.
[14]	李昭莹, 欧一鸣, 石若凌. 基于深度Q网络的改进RRT路径规划算法[J]. 空天防御, 2021, 4(3): 17-23.
	Li Zhaoying, Yiming Ou, Shi Ruoling. Improved RRT Path Planning Algorithm Based on Deep Q-network[J]. Air & Space Defense, 2021, 4(3): 17-23.
[15]	Qiu Yue, Zhou Suyang, Xia Dong, et al. Local Integrated Energy System Operational Optimization Considering Multi-type Uncertainties: A Reinforcement Learning Approach Based on Improved TD3 Algorithm[J]. IET Renewable Power Generation, 2023, 17(9): 2236-2256.
[16]	Bhourji R S, Mozaffari S, Alirezaee S. Reinforcement Learning DDPG–PPO Agent-based Control System for Rotary Inverted Pendulum[J]. Arabian Journal for Science and Engineering, 2024, 49(2): 1683-1696.
[17]	Hu Yutao, Zhao Yuntao, Feng Yongxin, et al. OneR-DQN: A Botnet Traffic Detection Model Based on Deep Q Network Algorithm in Deep Reinforcement Learning[J]. International Journal of Security and Networks, 2024, 19(1): 31-42.
[18]	Liu Yuchen, Man K L, Li Gangmin, et al. Evaluating and Selecting Deep Reinforcement Learning Models for Optimal Dynamic Pricing: A Systematic Comparison of PPO, DDPG, and SAC[C]//Proceedings of the 2024 8th International Conference on Control Engineering and Artificial Intelligence. New York: ACM, 2024: 215-219.
[19]	Schramm L, Deng Yunfu, Granados E, et al. USHER: Unbiased Sampling for Hindsight Experience Replay[C]//Proceedings of the 6th Conference on Robot Learning. Chia Laguna Resort: PMLR, 2023: 2073-2082.
[20]	Shi Chengchun, Zhu Jin, Shen Ye, et al. Off-policy Confidence Interval Estimation with Confounded Markov Decision Process[J]. Journal of the American Statistical Association, 2024, 119(545): 273-284.

超参数	值
折扣因子	0.99
学习率	0.000 3
软更新参数	0.005
经验池容量	10⁶
批容量	512
策略网络更新频率	2
训练回合数	2 000
训练最大步数	1 000
测试最大步数	500

环境	算法	平均路径长度/m	平均运行时间/s
环境Ⅰ	算法1	138.962	0.058
环境Ⅰ	算法2	140.969	0.045
环境Ⅱ	算法1	139.543	0.060
环境Ⅱ	算法2	131.417	0.044
环境Ⅳ	算法1	142.743	0.077
环境Ⅳ	算法2	151.549	0.051

算法	平均路径长度/m	平均运行时间/s	平均转向角/(°)	最小步数
RRT	161.327	0.074	3 264.6	452
RRT*	137.763	0.473	210.0	406
RRT-connect	156.976	0.102	2 893.9	423
DQN-RRT	157.044	0.069	1 924.8	59
本文算法	140.969	0.046	303.6	68

算法	平均路径长度/m	平均运行时间/s	平均转向角/(°)	最小步数
RRT	156.730	0.087	3 305.9	141
RRT*	131.433	1.792	324.1	289
RRT-connect	151.971	0.077	2 006.3	112
DQN-RRT	145.120	0.626	2 170.1	72
本文算法	130.843	0.033	235.2	48

算法	平均路径长度/m	平均运行时间/s	平均转向角/(°)	最小步数
RRT	152.762	0.055	2 246.8	337
RRT*	135.544	0.311	224.0	289
RRT-connect	152.411	0.097	2 239.3	88
DQN-RRT	178.228	0.294	1 916.9	76
本文算法	143.720	0.084	246.2	50