A USV Path Planning Algorithm under Special Environment Based on TD3-RRT

doi:10.16182/j.issn1004731x.joss.24-0622

Abstract

Abstract:

In view of USV path planning in special environments such as multiple obstacles, large-size obstacles, and narrow passages, the rapidly-exploring random tree (RRT) algorithm suffers from drawbacks such as a large sampling base, low success rate, and zigzagging planned path. To address these problems, a global path planning algorithm (TD3-RRT) was proposed based on the twin delayed deep deterministic policy gradient (TD3). The USV path search model was established by combining the RRT algorithm with deep reinforcement learning. Forward looking detection was used to sense the environment to adaptively adjust the step size. The path search direction was exported through the policy network to solve the problem of blind expansion in the RRT algorithm. An improved hindsight experience replay strategy was proposed, which enhanced the path search capability in complex environments by re-selecting the virtual targets and sampling in double experience replay pools. A reward function was designed to improve the quality of the planned path and accelerate the path searching speed. Experimental results show that under different environments, compared with current mainstream algorithms, TD3-RRT can effectively improve the path planning success rate and optimize the redundant steering angle, path length, and path planning time. which proves that the improved algorithms can effectively speed up the path search speed and improve the quality of paths. Furthermore, it has a good adaptability to different environments.

Key words: TD3 algorithm, path planning, special environment, RRT algorithm, USV, hindsight experience replay

CLC Number:

TP391.9

Chen Jitong, Zhou Jiajia, Wu Di, Jiang Hailong. A USV Path Planning Algorithm under Special Environment Based on TD3-RRT[J]. Journal of System Simulation, 2025, 37(11): 2888-2903.

Figures/Tables 20

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Fig. 6

Fig. 7

Table 1

Fig. 8

Table 2

Fig. 9

Fig. 10

Table 3

Table 4

Table 5

Table 6

Table 7

Fig. 11

Fig. 12

Fig. 13

References 20

[1]	Zhang Dengxing, Chen Chen, Zhang Guanyu. AGV Path Planning Based on Improved A-star Algorithm[C]//2024 IEEE 7th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC). Piscataway: IEEE, 2024: 1590-1595.
[2]	Fan Xiaojing, Guo Yinjing, Liu Hui, et al. Improved Artificial Potential Field Method Applied for AUV Path Planning[J]. Mathematical Problems in Engineering, 2020, 2020(1): 6523158.
[3]	Ubaidillah Achmad, Sukri Hanifudin. Application of Odometry and Dijkstra Algorithm as Navigation and Shortest Path Determination System of Warehouse Mobile Robot[J]. Journal of Robotics and Control, 2023, 4(3): 413-423.
[4]	Zhang Jingcheng, An Yuqiang, Cao Jianing, et al. UAV Trajectory Planning for Complex Open Storage Environments Based on an Improved RRT Algorithm[J]. IEEE Access, 2023, 11: 23189-23204.
[5]	Chen Zihong, Zhang Xing, Wang Liangyan, et al. A Fast Path Planning Method Based on RRT Star Algorithm[C]//2023 3rd International Conference on Consumer Electronics and Computer Engineering (ICCECE). Piscataway: IEEE, 2023: 258-262.
[6]	Tu Zhixin, Zhuang Wenbing, Leng Yuquan, et al. Accelerated Informed RRT: Fast and Asymptotically Path Planning Method Combined with RRT-connect and APF[C]//International Conference on Intelligent Robotics and Applications. Singapore: Springer Nature Singapore, 2023: 279-292.
[7]	Tu Haiyan, Deng Yizhao, Li Qiyang, et al. Improved RRT Global Path Planning Algorithm Based on Bridge Test[J]. Robotics and Autonomous Systems, 2024, 171: 104570.
[8]	Gu Qiyong, Zhen Rong, Liu Jialun, et al. An Improved RRT Algorithm Based on Prior AIS Information and DP Compression for Ship Path Planning[J]. Ocean Engineering, 2023, 279: 114595.
[9]	Huang Jie, Sun Wei. A Method of Feasible Trajectory Planning for UAV Formation Based on Bi-directional Fast Search Tree[J]. Optik, 2020, 221: 165213.
[10]	Nguyen T T, Nguyen N D, Nahavandi S. Deep Reinforcement Learning for Multiagent Systems: A Review of Challenges, Solutions, and Applications[J]. IEEE Transactions on Cybernetics, 2020, 50(9): 3826-3839.
[11]	Sun Qingqiang, Ge Zhiqiang. A Survey on Deep Learning for Data-driven Soft Sensors[J]. IEEE Transactions on Industrial Informatics, 2021, 17(9): 5853-5866.
[12]	Wang Ruihui, Xu Li. Application of Deep Reinforcement Learning in UAVs: A Review[C]//2022 34th Chinese Control and Decision Conference (CCDC). Piscataway: IEEE, 2022: 4096-4103.
[13]	Wang Haodong, Hao Jiangyu, Wu Wenhao, et al. A New AGV Path Planning Method Based on PPO Algorithm[C]//2023 42nd Chinese Control Conference (CCC). Piscataway: IEEE, 2023: 3760-3765.
[14]	李昭莹, 欧一鸣, 石若凌. 基于深度Q网络的改进RRT路径规划算法[J]. 空天防御, 2021, 4(3): 17-23.
	Li Zhaoying, Yiming Ou, Shi Ruoling. Improved RRT Path Planning Algorithm Based on Deep Q-network[J]. Air & Space Defense, 2021, 4(3): 17-23.
[15]	Qiu Yue, Zhou Suyang, Xia Dong, et al. Local Integrated Energy System Operational Optimization Considering Multi-type Uncertainties: A Reinforcement Learning Approach Based on Improved TD3 Algorithm[J]. IET Renewable Power Generation, 2023, 17(9): 2236-2256.
[16]	Bhourji R S, Mozaffari S, Alirezaee S. Reinforcement Learning DDPG–PPO Agent-based Control System for Rotary Inverted Pendulum[J]. Arabian Journal for Science and Engineering, 2024, 49(2): 1683-1696.
[17]	Hu Yutao, Zhao Yuntao, Feng Yongxin, et al. OneR-DQN: A Botnet Traffic Detection Model Based on Deep Q Network Algorithm in Deep Reinforcement Learning[J]. International Journal of Security and Networks, 2024, 19(1): 31-42.
[18]	Liu Yuchen, Man K L, Li Gangmin, et al. Evaluating and Selecting Deep Reinforcement Learning Models for Optimal Dynamic Pricing: A Systematic Comparison of PPO, DDPG, and SAC[C]//Proceedings of the 2024 8th International Conference on Control Engineering and Artificial Intelligence. New York: ACM, 2024: 215-219.
[19]	Schramm L, Deng Yunfu, Granados E, et al. USHER: Unbiased Sampling for Hindsight Experience Replay[C]//Proceedings of the 6th Conference on Robot Learning. Chia Laguna Resort: PMLR, 2023: 2073-2082.
[20]	Shi Chengchun, Zhu Jin, Shen Ye, et al. Off-policy Confidence Interval Estimation with Confounded Markov Decision Process[J]. Journal of the American Statistical Association, 2024, 119(545): 273-284.

超参数	值
折扣因子	0.99
学习率	0.000 3
软更新参数	0.005
经验池容量	10⁶
批容量	512
策略网络更新频率	2
训练回合数	2 000
训练最大步数	1 000
测试最大步数	500

环境	算法	平均路径长度/m	平均运行时间/s
环境Ⅰ	算法1	138.962	0.058
环境Ⅰ	算法2	140.969	0.045
环境Ⅱ	算法1	139.543	0.060
环境Ⅱ	算法2	131.417	0.044
环境Ⅳ	算法1	142.743	0.077
环境Ⅳ	算法2	151.549	0.051

算法	平均路径长度/m	平均运行时间/s	平均转向角/(°)	最小步数
RRT	161.327	0.074	3 264.6	452
RRT*	137.763	0.473	210.0	406
RRT-connect	156.976	0.102	2 893.9	423
DQN-RRT	157.044	0.069	1 924.8	59
本文算法	140.969	0.046	303.6	68

算法	平均路径长度/m	平均运行时间/s	平均转向角/(°)	最小步数
RRT	156.730	0.087	3 305.9	141
RRT*	131.433	1.792	324.1	289
RRT-connect	151.971	0.077	2 006.3	112
DQN-RRT	145.120	0.626	2 170.1	72
本文算法	130.843	0.033	235.2	48

算法	平均路径长度/m	平均运行时间/s	平均转向角/(°)	最小步数
RRT	152.762	0.055	2 246.8	337
RRT*	135.544	0.311	224.0	289
RRT-connect	152.411	0.097	2 239.3	88
DQN-RRT	178.228	0.294	1 916.9	76
本文算法	143.720	0.084	246.2	50