部分未知环境下基于行为克隆与改进DQN的AUV路径规划

doi:10.16182/j.issn1004731x.joss.24-0678

摘要/Abstract

摘要：

针对部分未知环境下单个自主水下航行器(autonomous underwater vehicle，AUV)的DQN动态路径规划算法存在随机性大及收敛慢的问题，提出一种融合行为克隆、A*算法与DQN的路径规划方法(behavior cloning with A* algorithm and DQN，BA_DQN)。基于已知的环境信息，提出一种结合海洋洋流阻力的改进A*算法来引导DQN，从而减小DQN算法的随机性；考虑到海洋环境复杂，在扩张积极经验池之后再次改进采样概率来提高训练成功率；针对DQN收敛慢的问题，提出一种先强化学习后行为克隆的改进算法。使用BA_DQN算法来控制AUV寻路，并在不同任务场景下开展仿真实验。仿真结果表明：BA_DQN算法比DQN算法的训练时间更短，比A*算法的决策更快，航行用时更短。

关键词: 自主水下航行器, 路径规划, A*算法, 强化学习, 行为克隆

Abstract:

To address the problems of large randomness and slow convergence of the DQN dynamic path planning algorithm for a single autonomous underwater vehicle (AUV) in a partially unknown environment, a path planning method combining behavior cloning with A* algorithm and DQN (BA_DQN) was proposed. Based on the known environmental information, an improved A* algorithm incorporating ocean current resistance was proposed to guide DQN, thereby reducing the randomness of the DQN algorithm. By considering the complexity of the marine environment, the sampling probability was improved again after expanding the positive experience pool to enhance the training success rate. To address the problem of slow convergence in DQN, an improved algorithm based on reinforcement learning followed by behavior cloning was proposed. The BA_DQN was used to control AUV pathfinding, and simulation experiments were carried out in different task scenarios. The simulation results show that the training time of the BA_DQN algorithm is shorter than that of the DQN algorithm; its decision-making is faster than that of the A* algorithm, and its sailing time is shorter.

Key words: AUV, path planning, A* algorithm, reinforcement learning, behavior cloning

中图分类号:

TP242

邢丽静,李敏,曾祥光等 . 部分未知环境下基于行为克隆与改进DQN的AUV路径规划[J]. 系统仿真学报, 2025, 37(11): 2754-2767.

Xing Lijing,Li Min,Zeng Xiangguang,et al . AUV Path Planning Based on Behavior Cloning and Improved DQN in Partially Unknown Environments[J]. Journal of System Simulation, 2025, 37(11): 2754-2767.

图/表 17

图1

图2

图3

表1

本文使用的参数

参数名称	参数值
Q网络学习率	0.000 1
行为克隆学习率 $β$	0.000 1
折扣因子	0.999
每回合采样数量	128
经验池容量	10 000
回合最大步数限制	1.5×A*规划步数

表1

图4

表2

训练2 000轮的不同模型效果对比图

算法名称	是否扩张经验池	随机采样	穿统采样	新采样	$n$ /10	$l a v e r$ /总路径长度	$t a v e r$ /总决策时长	$r a v e r$ /总洋流阻力
EPRO_A_DQN	√	×	×	√	8/10	9.486/75.86	0.193/1.540	3.62/28.97
EPER_A_DQN	√	×	√	×	5/10	9.730/48.64	0.224/1.120	4.40/22.00
ER_A_DQN	√	√	×	×	5/10	11.46/57.28	0.227/1.130	4.59/22.94
PRO_A_DQN	×	×	×	√	4/10	12.24/48.97	0.289/1.157	5.42/21.66
PER_A_DQN	×	×	√	×	4/10	15.63/62.53	0.292/1.168	4.84/19.34
A_DQN	×	√	×	×	3/10	24.59/73.79	0.648/1.944	8.97/26.91
DQN	×	×	×	×	0/10	-/46.43	-/0.940	-/18.53

表2

图5

图6

图7

表3

图8

表4

图9

图10

图11

表5

4种场景下5组实验的时间效果对比图

场景	算法	实验1		实验2		实验3		实验4		实验5		平均用时		$s a v e$ /%
场景	算法	$t d / s$	$c s a i$	$t d / s$	$c s a i$	$t d / s$	$c s a i$	$t d / s$	$c s a i$	$t d / s$	$c s a i$	$t d / s$	$c s a i$	$s a v e$ /%
Ⅰ	BA_DQN	0.33	21.6	0.30	25.5	0.32	24.6	0.31	17.1	0.31	29.2	0.314	23.60	11.8
Ⅰ	A*	0.40	25.9	0.32	29.3	0.35	25.0	0.34	18.4	0.37	34.3	0.356	26.58	11.8
Ⅱ	BA_DQN	0.39	24.8	0.38	30.7	0.35	29.3	0.37	29.3	0.37	27.7	0.372	28.36	17.7
Ⅱ	A*	0.42	30.4	0.49	33.8	0.39	32.7	0.46	34.5	0.50	34.1	0.452	33.10	17.7
Ⅲ	BA_DQN	0.59	44.3	0.56	39.5	0.59	40.3	0.54	44.6	0.49	38.1	0.554	41.36	19.7
Ⅲ	A*	0.86	47.0	0.60	42.1	0.73	42.8	0.61	46.1	0.65	44.4	0.690	44.48	19.7
Ⅳ	BA_DQN	1.39	103.9	1.33	109.0	1.38	95.1	1.50	98.1	1.43	97.0	1.406	100.62	36.0
Ⅳ	A*	1.85	104.7	1.86	129.3	2.53	97.5	2.46	114.3	2.29	116.8	2.198	112.52	36.0

表5

图12

参考文献 19

[1]	Takács Bence, Dóczi Roland, Sütő Balázs, et al. Extending AUV Response Robot Capabilities to Solve Standardized Test Methods[J]. Acta Polytechnica Hungarica, 2016, 13(1): 157-170.
[2]	赵苗, 高永琪, 吴笛霄, 等. 复杂海战场环境下AUV全局路径规划方法[J]. 国防科技大学学报, 2021, 43(1): 41-48.
	Zhao Miao, Gao Yongqi, Wu Dixiao, et al. AUV Global Path Planning Method in Complex Sea Battle Field Environment[J]. Journal of National University of Defense Technology, 2021, 43(1): 41-48.
[3]	郭银景, 侯佳辰, 吴琪, 等. AUV全局路径规划环境建模算法研究进展[J]. 舰船科学技术, 2021, 43(17): 12-18.
	Guo Yinjing, Hou Jiachen, Wu Qi, et al. Research Progress of AUV Global Path Planning Environment Modeling Algorithm[J]. Ship Science and Technology, 2021, 43(17): 12-18.
[4]	胡春磊, 章飞, 曾庆军. 基于多目标蚁群策略的AUV全局路径规划算法[J]. 传感器与微系统, 2020, 39(11): 107-109, 113.
	Hu Chunlei, Zhang Fei, Zeng Qingjun. Global Path Planning Algorithm for AUV Based on Multi-objective Ant Colony Strategy[J]. Transducer and Microsystem Technologies, 2020, 39(11): 107-109, 113.
[5]	李世奇, 孙兵, 朱蟋蟋. 海流环境下基于改进D∗算法的AUV动态路径规划[J]. 高技术通讯, 2022, 32(1): 84-92.
	Li Shiqi, Sun Bing, Zhu Xixi. Autonomous Underwater Vehicles Dynamic Path Planning Based on Improved D* Algorithm in Ocean Current Environment[J]. Chinese High Technology Letters, 2022, 32(1): 84-92.
[6]	洪晔, 王宏健, 边信黔. 基于分层马尔可夫决策过程的AUV全局路径规划研究[J]. 系统仿真学报, 2008, 20(9): 2361-2363, 2367.
	Hong Ye, Wang Hongjian, Bian Xinqian. Global Path Planning for AUV Based on Hierarchical Markov Decision Processes[J]. Journal of System Simulation, 2008, 20(9): 2361-2363, 2367.
[7]	王磊, 刘晶晶, 齐俊艳, 等. 基于改进人工势场法的AUV全局路径规划[J]. 河南理工大学学报(自然科学版), 2024, 43(1): 132-139.
	Wang Lei, Liu Jingjing, Qi Junyan, et al. A Global Path Planning Algorithm for AUV Based on Improved Artificial Potential Field Method[J]. Journal of Henan Polytechnic University(Natural Science), 2024, 43(1): 132-139.
[8]	薛双飞. 基于改进A^*算法的近海船舶路径规划[D]. 武汉: 武汉理工大学, 2018.
	Xue Shuangfei. Route Planning of Offshore Ships Based on Improved A^* Algorithm[D]. Wuhan: Wuhan University of Technology, 2018.
[9]	McColgan J, McGookin E W, Mazlan A N A. A Low Fidelity Mathematical Model of a Biomimetic AUV for Multi-vehicle Cooperation[C]//OCEANS 2015 - Genova. Piscataway: IEEE, 2015: 1-10.
[10]	Luo Lei, Zhao Ning, Zhu Yi, et al. A^* Guiding DQN Algorithm for Automated Guided Vehicle Pathfinding Problem of Robotic Mobile Fulfillment Systems[J]. Computers & Industrial Engineering, 2023, 178: 109112.
[11]	Yang Yang, Li Juntao, Peng Lingling. Multi-robot Path Planning Based on a Deep Reinforcement Learning DQN Algorithm[J]. CAAI Transactions on Intelligence Technology, 2020, 5(3): 177-183.
[12]	Zhang Yaoyu, Li Caihong, Zhang Guosheng, et al. Research on the Local Path Planning for Mobile Robots Based on PRO-dueling Deep Q-network (DQN) Algorithm[J]. International Journal of Advanced Computer Science and Applications, 2023, 14(8): 381-387.
[13]	罗磊, 赵宁, 任成栋. 基于行为克隆和奖励重构的AGV路径规划算法[J/OL]. 计算机集成制造系统. (2023-10-24)[2024-03-27]. .
	Luo Lei, Zhao Ning, Ren Chengdong. Reinforcement Learning Algorithm for AGV Path Planning Based on Behavioral Cloning and Reward Reconstruction[J/OL]. Computer Integrated Manufacturing Systems. (2023-10-24)[2024-03-27]. .
[14]	周娴玮, 包明豪, 叶鑫, 等. 带Q网络过滤的两阶段TD3深度强化学习方法[J]. 计算机技术与发展, 2023, 33(10): 101-108.
	Zhou Xianwei, Bao Minghao, Ye Xin, et al. Two-stage TD3 Deep Reinforcement Learning Algorithm with Q Network Filtration[J]. Computer Technology and Development, 2023, 33(10): 101-108.
[15]	Yang Jiachen, Ni Jingfei, Xi Meng, et al. Intelligent Path Planning of Underwater Robot Based on Reinforcement Learning[J]. IEEE Transactions on Automation Science and Engineering, 2023, 20(3): 1983-1996.
[16]	Yang Xiao, Han Qilong. Improved DQN for Dynamic Obstacle Avoidance and Ship Path Planning[J]. Algorithms, 2023, 16(5): 220.
[17]	Lou Ping, Xu Kun, Jiang Xuemei, et al. Path Planning in an Unknown Environment Based on Deep Reinforcement Learning with Prior Knowledge[J]. Journal of Intelligent & Fuzzy Systems, 2021, 41(6): 5773-5789.
[18]	Wenzel Pilar von Pilchau, Stein Anthony, Hähner Jörg, et al. Synthetic Experiences for Accelerating DQN Performance in Discrete Non-deterministic Environments[J]. Algorithms, 2021, 14(8): 226.
[19]	Zeyad Abd Algfoor, Mohd Shahrizal Sunar, Kolivand Hoshang. A Comprehensive Study on Pathfinding Techniques for Robotics and Video Games[J]. International Journal of Computer Games Technology, 2015, 2015(1): 736138.

场景	目标点数量	障碍物数量	动静比例
场景Ⅰ	1	45	8:1
场景Ⅱ	1	80	7:1
场景Ⅲ	10	45	8:1
场景Ⅳ	10	80	7:1