AUV Path Planning Based on Behavior Cloning and Improved DQN in Partially Unknown Environments

doi:10.16182/j.issn1004731x.joss.24-0678

Abstract

Abstract:

To address the problems of large randomness and slow convergence of the DQN dynamic path planning algorithm for a single autonomous underwater vehicle (AUV) in a partially unknown environment, a path planning method combining behavior cloning with A* algorithm and DQN (BA_DQN) was proposed. Based on the known environmental information, an improved A* algorithm incorporating ocean current resistance was proposed to guide DQN, thereby reducing the randomness of the DQN algorithm. By considering the complexity of the marine environment, the sampling probability was improved again after expanding the positive experience pool to enhance the training success rate. To address the problem of slow convergence in DQN, an improved algorithm based on reinforcement learning followed by behavior cloning was proposed. The BA_DQN was used to control AUV pathfinding, and simulation experiments were carried out in different task scenarios. The simulation results show that the training time of the BA_DQN algorithm is shorter than that of the DQN algorithm; its decision-making is faster than that of the A* algorithm, and its sailing time is shorter.

Key words: AUV, path planning, A* algorithm, reinforcement learning, behavior cloning

CLC Number:

TP242

Xing Lijing, Li Min, Zeng Xiangguang, Zhang Ping, Peng Bei. AUV Path Planning Based on Behavior Cloning and Improved DQN in Partially Unknown Environments[J]. Journal of System Simulation, 2025, 37(11): 2754-2767.

Figures/Tables 17

Fig. 1

Fig. 2

Fig. 3

Table 1

Parameters used in this article

参数名称	参数值
Q网络学习率	0.000 1
行为克隆学习率 $β$	0.000 1
折扣因子	0.999
每回合采样数量	128
经验池容量	10 000
回合最大步数限制	1.5×A*规划步数

Table 1

Fig. 4

Table 2

Comparison of different models' performance after 2 000 training epochs

算法名称	是否扩张经验池	随机采样	穿统采样	新采样	$n$ /10	$l a v e r$ /总路径长度	$t a v e r$ /总决策时长	$r a v e r$ /总洋流阻力
EPRO_A_DQN	√	×	×	√	8/10	9.486/75.86	0.193/1.540	3.62/28.97
EPER_A_DQN	√	×	√	×	5/10	9.730/48.64	0.224/1.120	4.40/22.00
ER_A_DQN	√	√	×	×	5/10	11.46/57.28	0.227/1.130	4.59/22.94
PRO_A_DQN	×	×	×	√	4/10	12.24/48.97	0.289/1.157	5.42/21.66
PER_A_DQN	×	×	√	×	4/10	15.63/62.53	0.292/1.168	4.84/19.34
A_DQN	×	√	×	×	3/10	24.59/73.79	0.648/1.944	8.97/26.91
DQN	×	×	×	×	0/10	-/46.43	-/0.940	-/18.53

Table 2

Fig. 5

Fig. 6

Fig. 7

Table 3

Fig. 8

Table 4

Fig. 9

Fig. 10

Fig. 11

Table 5

Comparison of time efficiency across five experimental groups in four scenarios

场景	算法	实验1		实验2		实验3		实验4		实验5		平均用时		$s a v e$ /%
场景	算法	$t d / s$	$c s a i$	$t d / s$	$c s a i$	$t d / s$	$c s a i$	$t d / s$	$c s a i$	$t d / s$	$c s a i$	$t d / s$	$c s a i$	$s a v e$ /%
Ⅰ	BA_DQN	0.33	21.6	0.30	25.5	0.32	24.6	0.31	17.1	0.31	29.2	0.314	23.60	11.8
Ⅰ	A*	0.40	25.9	0.32	29.3	0.35	25.0	0.34	18.4	0.37	34.3	0.356	26.58	11.8
Ⅱ	BA_DQN	0.39	24.8	0.38	30.7	0.35	29.3	0.37	29.3	0.37	27.7	0.372	28.36	17.7
Ⅱ	A*	0.42	30.4	0.49	33.8	0.39	32.7	0.46	34.5	0.50	34.1	0.452	33.10	17.7
Ⅲ	BA_DQN	0.59	44.3	0.56	39.5	0.59	40.3	0.54	44.6	0.49	38.1	0.554	41.36	19.7
Ⅲ	A*	0.86	47.0	0.60	42.1	0.73	42.8	0.61	46.1	0.65	44.4	0.690	44.48	19.7
Ⅳ	BA_DQN	1.39	103.9	1.33	109.0	1.38	95.1	1.50	98.1	1.43	97.0	1.406	100.62	36.0
Ⅳ	A*	1.85	104.7	1.86	129.3	2.53	97.5	2.46	114.3	2.29	116.8	2.198	112.52	36.0

Table 5

Fig. 12

References 19

[1]	Takács Bence, Dóczi Roland, Sütő Balázs, et al. Extending AUV Response Robot Capabilities to Solve Standardized Test Methods[J]. Acta Polytechnica Hungarica, 2016, 13(1): 157-170.
[2]	赵苗, 高永琪, 吴笛霄, 等. 复杂海战场环境下AUV全局路径规划方法[J]. 国防科技大学学报, 2021, 43(1): 41-48.
	Zhao Miao, Gao Yongqi, Wu Dixiao, et al. AUV Global Path Planning Method in Complex Sea Battle Field Environment[J]. Journal of National University of Defense Technology, 2021, 43(1): 41-48.
[3]	郭银景, 侯佳辰, 吴琪, 等. AUV全局路径规划环境建模算法研究进展[J]. 舰船科学技术, 2021, 43(17): 12-18.
	Guo Yinjing, Hou Jiachen, Wu Qi, et al. Research Progress of AUV Global Path Planning Environment Modeling Algorithm[J]. Ship Science and Technology, 2021, 43(17): 12-18.
[4]	胡春磊, 章飞, 曾庆军. 基于多目标蚁群策略的AUV全局路径规划算法[J]. 传感器与微系统, 2020, 39(11): 107-109, 113.
	Hu Chunlei, Zhang Fei, Zeng Qingjun. Global Path Planning Algorithm for AUV Based on Multi-objective Ant Colony Strategy[J]. Transducer and Microsystem Technologies, 2020, 39(11): 107-109, 113.
[5]	李世奇, 孙兵, 朱蟋蟋. 海流环境下基于改进D∗算法的AUV动态路径规划[J]. 高技术通讯, 2022, 32(1): 84-92.
	Li Shiqi, Sun Bing, Zhu Xixi. Autonomous Underwater Vehicles Dynamic Path Planning Based on Improved D* Algorithm in Ocean Current Environment[J]. Chinese High Technology Letters, 2022, 32(1): 84-92.
[6]	洪晔, 王宏健, 边信黔. 基于分层马尔可夫决策过程的AUV全局路径规划研究[J]. 系统仿真学报, 2008, 20(9): 2361-2363, 2367.
	Hong Ye, Wang Hongjian, Bian Xinqian. Global Path Planning for AUV Based on Hierarchical Markov Decision Processes[J]. Journal of System Simulation, 2008, 20(9): 2361-2363, 2367.
[7]	王磊, 刘晶晶, 齐俊艳, 等. 基于改进人工势场法的AUV全局路径规划[J]. 河南理工大学学报(自然科学版), 2024, 43(1): 132-139.
	Wang Lei, Liu Jingjing, Qi Junyan, et al. A Global Path Planning Algorithm for AUV Based on Improved Artificial Potential Field Method[J]. Journal of Henan Polytechnic University(Natural Science), 2024, 43(1): 132-139.
[8]	薛双飞. 基于改进A^*算法的近海船舶路径规划[D]. 武汉: 武汉理工大学, 2018.
	Xue Shuangfei. Route Planning of Offshore Ships Based on Improved A^* Algorithm[D]. Wuhan: Wuhan University of Technology, 2018.
[9]	McColgan J, McGookin E W, Mazlan A N A. A Low Fidelity Mathematical Model of a Biomimetic AUV for Multi-vehicle Cooperation[C]//OCEANS 2015 - Genova. Piscataway: IEEE, 2015: 1-10.
[10]	Luo Lei, Zhao Ning, Zhu Yi, et al. A^* Guiding DQN Algorithm for Automated Guided Vehicle Pathfinding Problem of Robotic Mobile Fulfillment Systems[J]. Computers & Industrial Engineering, 2023, 178: 109112.
[11]	Yang Yang, Li Juntao, Peng Lingling. Multi-robot Path Planning Based on a Deep Reinforcement Learning DQN Algorithm[J]. CAAI Transactions on Intelligence Technology, 2020, 5(3): 177-183.
[12]	Zhang Yaoyu, Li Caihong, Zhang Guosheng, et al. Research on the Local Path Planning for Mobile Robots Based on PRO-dueling Deep Q-network (DQN) Algorithm[J]. International Journal of Advanced Computer Science and Applications, 2023, 14(8): 381-387.
[13]	罗磊, 赵宁, 任成栋. 基于行为克隆和奖励重构的AGV路径规划算法[J/OL]. 计算机集成制造系统. (2023-10-24)[2024-03-27]. .
	Luo Lei, Zhao Ning, Ren Chengdong. Reinforcement Learning Algorithm for AGV Path Planning Based on Behavioral Cloning and Reward Reconstruction[J/OL]. Computer Integrated Manufacturing Systems. (2023-10-24)[2024-03-27]. .
[14]	周娴玮, 包明豪, 叶鑫, 等. 带Q网络过滤的两阶段TD3深度强化学习方法[J]. 计算机技术与发展, 2023, 33(10): 101-108.
	Zhou Xianwei, Bao Minghao, Ye Xin, et al. Two-stage TD3 Deep Reinforcement Learning Algorithm with Q Network Filtration[J]. Computer Technology and Development, 2023, 33(10): 101-108.
[15]	Yang Jiachen, Ni Jingfei, Xi Meng, et al. Intelligent Path Planning of Underwater Robot Based on Reinforcement Learning[J]. IEEE Transactions on Automation Science and Engineering, 2023, 20(3): 1983-1996.
[16]	Yang Xiao, Han Qilong. Improved DQN for Dynamic Obstacle Avoidance and Ship Path Planning[J]. Algorithms, 2023, 16(5): 220.
[17]	Lou Ping, Xu Kun, Jiang Xuemei, et al. Path Planning in an Unknown Environment Based on Deep Reinforcement Learning with Prior Knowledge[J]. Journal of Intelligent & Fuzzy Systems, 2021, 41(6): 5773-5789.
[18]	Wenzel Pilar von Pilchau, Stein Anthony, Hähner Jörg, et al. Synthetic Experiences for Accelerating DQN Performance in Discrete Non-deterministic Environments[J]. Algorithms, 2021, 14(8): 226.
[19]	Zeyad Abd Algfoor, Mohd Shahrizal Sunar, Kolivand Hoshang. A Comprehensive Study on Pathfinding Techniques for Robotics and Video Games[J]. International Journal of Computer Games Technology, 2015, 2015(1): 736138.

场景	目标点数量	障碍物数量	动静比例
场景Ⅰ	1	45	8:1
场景Ⅱ	1	80	7:1
场景Ⅲ	10	45	8:1
场景Ⅳ	10	80	7:1