系统仿真学报 ›› 2025, Vol. 37 ›› Issue (11): 2754-2767.doi: 10.16182/j.issn1004731x.joss.24-0678

• 论文 • 上一篇    

部分未知环境下基于行为克隆与改进DQN的AUV路径规划

邢丽静1, 李敏1, 曾祥光1, 张萍2, 彭倍2   

  1. 1.西南交通大学 机械学院,四川 成都 610031
    2.电子科技大学,四川 成都 610031
  • 收稿日期:2024-06-26 修回日期:2024-09-11 出版日期:2025-11-18 发布日期:2025-11-27
  • 通讯作者: 李敏
  • 第一作者简介:邢丽静(1999-),女,硕士生,研究方向为AUV路径规划,深度强化学习。
  • 基金资助:
    国家自然科学基金(52075456);四川省科技厅重点研发计划(2023YFG0285);四川省科技厅重点研发计划(2019ZDZX0020)

AUV Path Planning Based on Behavior Cloning and Improved DQN in Partially Unknown Environments

Xing Lijing1, Li Min1, Zeng Xiangguang1, Zhang Ping2, Peng Bei2   

  1. 1.School of Mechanical Engineering, Southwest Jiaotong University, Chengdu 610031, China
    2.University of Electronic Science and Technology of China, Chengdu 610031, China
  • Received:2024-06-26 Revised:2024-09-11 Online:2025-11-18 Published:2025-11-27
  • Contact: Li Min

摘要:

针对部分未知环境下单个自主水下航行器(autonomous underwater vehicle,AUV)的DQN动态路径规划算法存在随机性大及收敛慢的问题,提出一种融合行为克隆、A*算法与DQN的路径规划方法(behavior cloning with A* algorithm and DQN,BA_DQN)。基于已知的环境信息,提出一种结合海洋洋流阻力的改进A*算法来引导DQN,从而减小DQN算法的随机性;考虑到海洋环境复杂,在扩张积极经验池之后再次改进采样概率来提高训练成功率;针对DQN收敛慢的问题,提出一种先强化学习后行为克隆的改进算法。使用BA_DQN算法来控制AUV寻路,并在不同任务场景下开展仿真实验。仿真结果表明:BA_DQN算法比DQN算法的训练时间更短,比A*算法的决策更快,航行用时更短。

关键词: 自主水下航行器, 路径规划, A*算法, 强化学习, 行为克隆

Abstract:

To address the problems of large randomness and slow convergence of the DQN dynamic path planning algorithm for a single autonomous underwater vehicle (AUV) in a partially unknown environment, a path planning method combining behavior cloning with A* algorithm and DQN (BA_DQN) was proposed. Based on the known environmental information, an improved A* algorithm incorporating ocean current resistance was proposed to guide DQN, thereby reducing the randomness of the DQN algorithm. By considering the complexity of the marine environment, the sampling probability was improved again after expanding the positive experience pool to enhance the training success rate. To address the problem of slow convergence in DQN, an improved algorithm based on reinforcement learning followed by behavior cloning was proposed. The BA_DQN was used to control AUV pathfinding, and simulation experiments were carried out in different task scenarios. The simulation results show that the training time of the BA_DQN algorithm is shorter than that of the DQN algorithm; its decision-making is faster than that of the A* algorithm, and its sailing time is shorter.

Key words: AUV, path planning, A* algorithm, reinforcement learning, behavior cloning

中图分类号: