系统仿真学报 ›› 2025, Vol. 37 ›› Issue (11): 2888-2903.doi: 10.16182/j.issn1004731x.joss.24-0622

• 论文 • 上一篇    

基于TD3-RRT的特殊环境下USV路径规划算法研究

陈际同1, 周佳加1, 吴迪2, 江海龙3   

  1. 1.哈尔滨工程大学 智能科学与工程学院,黑龙江 哈尔滨 150000
    2.哈尔滨工程大学 青岛创新发展基地,山东 青岛 266000
    3.中国电子科技集团公司第二十九研究所,四川 成都 610000
  • 收稿日期:2024-06-11 修回日期:2024-08-05 出版日期:2025-11-18 发布日期:2025-11-27
  • 通讯作者: 周佳加
  • 第一作者简介:陈际同(2000-),男,硕士生,研究方向为深度强化学习。
  • 基金资助:
    基于图网络的水下无人航行器全海域定位方法研究(51909044)

A USV Path Planning Algorithm under Special Environment Based on TD3-RRT

Chen Jitong1, Zhou Jiajia1, Wu Di2, Jiang Hailong3   

  1. 1.College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150000, China
    2.Qingdao Innovation and Development Center, Harbin Engineering University, Qingdao 266000, China
    3.The 29th Institute of China Electronics Technology Group Corporation, Chengdu 610000, China
  • Received:2024-06-11 Revised:2024-08-05 Online:2025-11-18 Published:2025-11-27
  • Contact: Zhou Jiajia

摘要:

面对多障碍、大尺寸障碍、狭窄通道等特殊环境下的USV路径规划问题,快速扩展随机树算法(rapidly-exploring random trees,RRT)存在采样基数大、规划成功率低、规划路径曲折等缺点。基于双延迟深度确定性策略梯度(twin delayed deep deterministic policy gradient,TD3)提出一种全局路径规划算法(TD3-RRT)。结合RRT算法与深度强化学习建立USV路径搜索模型利用前视探测感知环境以自适应调整扩展步长通过策略网络输出路径搜索方向解决RRT算法扩展盲目的问题改进后见经验回放策略通过重选虚拟目标双经验回放池采样等策略以增强复杂环境下路径搜索能力通过奖励函数提高规划路径质量加快路径搜索速度。实验结果表明:不同环境下TD3-RRT相比当前主流算法能够有效提高规划成功率,优化转向角度、路径长度和规划时间,证明了改进算法能有效加快路径搜索速度并提高路径质量,且对不同环境具有良好适应性。

关键词: 双延迟深度确定性策略梯度算法, 路径规划, 特殊环境, 快速扩展随机树算法, USV, 后见经验回放

Abstract:

In view of USV path planning in special environments such as multiple obstacles, large-size obstacles, and narrow passages, the rapidly-exploring random tree (RRT) algorithm suffers from drawbacks such as a large sampling base, low success rate, and zigzagging planned path. To address these problems, a global path planning algorithm (TD3-RRT) was proposed based on the twin delayed deep deterministic policy gradient (TD3). The USV path search model was established by combining the RRT algorithm with deep reinforcement learning. Forward looking detection was used to sense the environment to adaptively adjust the step size. The path search direction was exported through the policy network to solve the problem of blind expansion in the RRT algorithm. An improved hindsight experience replay strategy was proposed, which enhanced the path search capability in complex environments by re-selecting the virtual targets and sampling in double experience replay pools. A reward function was designed to improve the quality of the planned path and accelerate the path searching speed. Experimental results show that under different environments, compared with current mainstream algorithms, TD3-RRT can effectively improve the path planning success rate and optimize the redundant steering angle, path length, and path planning time. which proves that the improved algorithms can effectively speed up the path search speed and improve the quality of paths. Furthermore, it has a good adaptability to different environments.

Key words: TD3 algorithm, path planning, special environment, RRT algorithm, USV, hindsight experience replay

中图分类号: