Research on Control Strategy for Shortest Time Occupancy of AUV Based on Improved TD3

doi:10.16182/j.issn1004731x.joss.25-0682

Abstract

Abstract:

Existing occupancy models fail to fully consider the interference of underwater time-varying ocean currents and task time constraints, and AUVs lacks real-time motion control. To address these issues, a shortest time occupancy method based on quantile regression and distributed TD3 was proposed. The Bayesian inference method was used to identify hydrodynamic parameters, and the kinematic and dynamic models of AUVs were established; the shortest time occupancy equation was constructed, and the occupancy target point and occupancy time were solved; a first-order Gauss-Markov process was introduced to simulate the time-varying ocean current environment, and the training of control strategy for AUV occupancy in ocean current scenarios with different intensities was completed based on the distributed TD3 algorithm. The simulation results indicate that this method exhibits good robustness and adaptability under dynamic ocean current interference. Especially when the ocean current intensity is high, compared with the TD3 baseline algorithm, the strategy convergence rate is improved by 30%, and the accuracy and success rate of AUV occupancy are improved by 63% and 20%, respectively.

Key words: autonomous underwater vehicle, underwater occupancy, TD3 algorithm, Bayesian inference, shortest occupancy time

CLC Number:

TP242.6

Ren Wenzhe, Li Min, Zeng Xiangguang, Zhang Tao, Xie Dijie, Peng Bei. Research on Control Strategy for Shortest Time Occupancy of AUV Based on Improved TD3[J]. Journal of System Simulation, 2026, 38(6): 1684-1698.

Figures/Tables 24

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Table 1

Range and distribution of dynamic parameters

参数	物理意义	一般范围	先验分布
$X u$	纵向线性阻尼系数	(-10, 10)	正态分布 $N ∼ (0,3.33)$
$X u u$	纵向二次阻尼系数	(5, 20)	半正态分布 $H N ∼ (2.5)$
$Y v$	侧向线性阻尼系数	(0, 50)	半正态分布 $H N ∼ (8.33)$
$Y v v$	侧向二次阻尼系数	(100, 3 000)	半正态分布 $H N ∼ (483.33)$
$N r$	艏摇线性阻尼系数	(0, 20)	半正态分布 $H N ∼ (3.33)$
$N r r$	艏摇二次阻尼系数	(10, 300)	半正态分布 $H N ∼ (48.33)$
$Y r$	耦合阻尼系数	(0, 20)	半正态分布 $H N ∼ (3.33)$
$N v$	耦合阻尼系数	(0, 100)	半正态分布 $H N ∼ (16.66)$
$X u'$	纵向附加质量	(-10, -0.5)	负半正态分布 $H N ∼ (1.58)$
$Y v'$	侧向附加质量	(-100, -10)	负半正态分布 $H N ∼ (15)$
$N r'$	横摇附加惯性矩	(-20, -1)	负半正态分布 $H N ∼ (3.16)$
$Y r'$	耦合附加质量	(-10, -1)	负半正态分布 $H N ∼ (1.5)$

Table 1

Table 2

Posterior predictive values of dynamic parameters

参数	后验预测值	参数	后验预测值
$X u$	0.169	$Y r$	5.017
$X u u$	8.109	$N v$	36.099
$Y v$	9.766	$X u'$	-1.399
$Y v v$	200.666	$Y v'$	-38.537
$N r$	5.007	$N r'$	-8.877
$N r r$	15.058	$Y r'$	-2.538

Table 2

Table 3

R2 value of dynamic parameters

输入	$R 2 (u)$	$R 2 (v)$	$R 2 (r)$	平均值
恒定	0.999 9	0.999 9	0.999 5	0.999 8
正弦	0.999 9	0.999 8	0.999 9	0.999 9
方波	0.999 9	0.999 5	0.999 6	0.999 7

Table 3

Fig. 5

Fig. 6

Fig. 7

Table 4

Training hyperparameters

参数名称	参数值
策略网络学习率	1×10^-4
价值网络学习率	3×10^-4
折扣因子 $γ$	0.99
软更新系数 $ς$	5×10^-3
经验池容量	1×10⁵
每次训练样本数	512
训练最大回合数	4 000
每回合最大步数	460
神经元个数	256
分位数个数	16
Huber损失阈值 $κ$	1

Table 4

Table 5

Values of ocean current model parameters

参数名称	参数值
洋流变化速率 $β$	0.01
洋流随机扰动幅度 $σ w$	5×10^-3
洋流变化间隔/s	5
弱洋流最大速度 $v m i n$ /(m/s)	0.1
强洋流最大速度 $v m a x$ /(m/s)	1.0
仿真步长	0.1
最大仿真时间/s	46
AUV最大推进力/N	34
AUV最大转矩/N	10

Table 5

Table 6

Setting of reward function parameters

参数名称	参数值
距离奖励系数 $ω 1$	2
航向角度奖励系数 $ω 2$	8
速度奖励系数 $ω 3$	10
偏航惩罚系数 $ω 4 、 ω 5$	10
完成奖励 $σ 1$	500
距离进展奖励系数 $ω 6$	100
距离误差d/m	3

Table 6

Fig. 8

Fig. 9

Fig. 10

Fig. 11

Table 7

Fig. 12

Fig. 13

Table 8

Fig. 14

Fig. 15

Table 9

References 22

[1]	潘云伟, 李敏, 曾祥光, 等. 基于人工势场和改进强化学习的自主式水下潜航器避障和航迹规划[J]. 兵工学报, 2025, 46(4): 70-81.
	Pan Yunwei, Li Min, Zeng Xiangguang, et al. AUV Obstacle Avoidance and Path Planning Based on Artificial Potential Field and Improved Reinforcement Learning[J]. Acta Armamentarii, 2025, 46(4): 70-81.
[2]	任勇, 王景璟, 杜军, 等. 自主潜航器关键技术及应用[M]. 北京: 人民邮电出版社, 2021.
	Ren Yong, Wang Jingjing, Du Jun, et al. Key Technologies and Applications of Autonomous Underwater Vehicles[M]. Beijing: Posts & Telecom Press, 2021.
[3]	童心赤. 水下航行器攻击占位研究[D]. 武汉: 武汉理工大学, 2021.
	Tong Xinchi. Research on Occupying Attack Position of Unmanned Underwater Vehicle[D]. Wuhan: Wuhan University of Technology, 2021.
[4]	夏佩伦. 潜艇鱼雷攻击占位机动方案的确定与分析[J]. 火力与指挥控制, 2013, 38(11): 114-117.
	Xia Peilun. Determination and Analysis of Getting-to-the-firing-position Maneuver Scheme for Submarine Attacking with Torpedo[J]. Fire Control & Command Control, 2013, 38(11): 114-117.
[5]	温洪, 魏石川, 陈志鹏, 等. 鱼雷攻击占位相关参数计算[J]. 指挥控制与仿真, 2008, 30(3): 58-60.
	Wen Hong, Wei Shichuan, Chen Zhipeng, et al. Calculation on Relative Parameter of Favored Submarine Position for Torpedo Attack[J]. Command Control & Simulation, 2008, 30(3): 58-60.
[6]	吴志泉, 吴自飞, 李世雄. 基于局部放大法的舰船占领阵位求解方法[J]. 船舶工程, 2023, 45(增1): 389-392.
	Wu Zhiquan, Wu Zifei, Li Shixiong. Solution Method of Occupying Ship Position Based on Partial Enlargement[J]. Ship Engineering, 2023, 45(S1): 389-392.
[7]	王钊, 王宏健, 张宏瀚, 等. UUV攻防博弈的自适应攻击占位机动决策研究[J]. 控制与决策, 2024, 39(11): 3819-3828.
	Wang Zhao, Wang Hongjian, Zhang Honghan, et al. Adaptive Attack Occupancy Maneuver Decision of UUV Attack-defense Game[J]. Control and Decision, 2024, 39(11): 3819-3828.
[8]	宋保维, 姜军, 王鹏, 等. 基于Markov过程的潜艇占位能力模型研究[J]. 鱼雷技术, 2007, 15(4): 45-48.
	Song Baowei, Jiang Jun, Wang Peng, et al. Modelling Taking-up Position Capability of Submarine Based on Markov Chain[J]. Torpedo Technology, 2007, 15(4): 45-48.
[9]	Tong Xinchi, Zhang Huajun, Guo Hang. Research on Occupancy Maneuvering Scheme of Unmanned Underwater Vehicle[C]//2020 Chinese Control and Decision Conference (CCDC). Piscataway: IEEE, 2020: 3478-3483.
[10]	Safari Farhad, Rafeeyan Mansour, Danesh Mohammad. Estimation of Hydrodynamic Coefficients and Simplification of the Depth Model of an AUV Using CFD and Sensitivity Analysis[J]. Ocean Engineering, 2022, 263: 112369.
[11]	高婷, 庞永杰, 王亚兴, 等. 水下航行器水动力系数计算方法[J]. 哈尔滨工程大学学报, 2019, 40(1): 174-180.
	Gao Ting, Pang Yongjie, Wang Yaxing, et al. Calculation Method of Hydrodynamic Coefficients for Underwater Vehicles[J]. Journal of Harbin Engineering University, 2019, 40(1): 174-180.
[12]	Wang Xu, Wang Sen, Liang Xingxing, et al. Deep Reinforcement Learning: A Survey[J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(4): 5064-5078.
[13]	Tai Lei, Paolo Giuseppe, Liu Ming. Virtual-to-real Deep Reinforcement Learning: Continuous Control of Mobile Robots for Mapless Navigation[C]//2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Piscataway: IEEE, 2017: 31-36.
[14]	Zhelo Oleksii, Zhang Jingwei, Tai Lei, et al. Curiosity-driven Exploration for Mapless Navigation with Deep Reinforcement Learning[EB/OL]. (2018-05-14) [2025-07-06]. .
[15]	Fujimoto Scott, Hoof Herke, Meger David. Addressing Function Approximation Error in Actor-critic Methods[C]//Proceedings of the 35th International Conference on Machine Learning. Chia Laguna Resort: PMLR, 2018: 1587-1596.
[16]	Dabney W, Ostrovski G, Silver D, et al. Implicit Quantile Networks for Distributional Reinforcement Learning[C]//Proceedings of the 35th International Conference on Machine Learning. Chia Laguna Resort: PMLR, 2018: 1096-1105.
[17]	Bellemare M G, Dabney W, Munos Rémi. A Distributional Perspective on Reinforcement Learning[C]//Proceedings of the 34th International Conference on Machine Learning. Chia Laguna Resort: PMLR, 2017: 449-458.
[18]	Dabney W, Rowland M, Bellemare M G, et al. Distributional Reinforcement Learning with Quantile Regression[C]//Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence. Palo Alto: AAAI Press, 2018: 2892-2901.
[19]	Mavrin Borislav, Yao Hengshuai, Kong Linglong, et al. Distributional Reinforcement Learning for Efficient Exploration[C]//Proceedings of the 36th International Conference on Machine Learning. Chia Laguna Resort: PMLR, 2019: 4424-4434.
[20]	刘开周, 赵洋. 水下机器人建模与仿真技术[M]. 北京: 科学出版社, 2020.
	Liu Kaizhou, Zhao Yang. Modeling and Simulation Technology for Underwater Vehicle[M]. Beijing: Science Press, 2020.
[21]	Ahmed Faheem, Xiang Xianbo, Zhou Guangzhao, et al. Dynamic Modeling and Maneuvering of Remus 100 AUV: The Impact of Added Mass Coefficients[C]//2023 42nd Chinese Control Conference (CCC). Piscataway: IEEE, 2023: 1424-1429.
[22]	Hong Lin, Fang Renjie, Cai Xiaotian, et al. Numerical Investigation on Hydrodynamic Performance of a Portable AUV[J]. Journal of Marine Science and Engineering, 2021, 9(8): 812.

指标		DTD3	TD3
弱洋流	时间/s	44.4	45.0
	坐标/m	(-44.50, 72.13)	(-44.96, 71.81)
	时间误差/s	0.61	1.21
	坐标误差/m	2.83	2.91
强洋流	时间/s	44.1	46.0
	坐标/m	(-45.24, 71.80)	(-41.51, 68.17)
	时间误差/s	0.31	2.21
	坐标误差/m	2.83	7.79

算法	占位坐标/m	占位时间/s	实际坐标/m	实际时间/s	平均坐标误差/m	平均时间误差/s
DTD3	(-41.71, 55.11)	34.6	(-39.77, 53.07)	35.3	2.86	1.2
	(-63.55, 121.38)	68.5	(-61.98, 119.03)	70.5
	(-32.49, 74.58)	40.7	(-30.47, 72.47)	41.9
TD3	(-69.91, 101.51)	61.6	(-68.41, 98.92)	62.6	3.00	1.7
	(-36.52, 68.18)	38.6	(-35.42, 65.48)	40.3
	(-50.92, 102.99)	57.4	(-52.30, 100.21)	59.6

算法	占位坐标/m	占位时间/s	实际坐标/m	实际时间/s	平均坐标误差/m	平均时间误差/s
DTD3	(-21.02, 28.35)	17.6	(-16.23, 27.27)	20.8	5.62	3.6
	(-49.37, 64.14)	40.5	(-44.96, 62.07)	44.3
	(-36.33, 58.32)	34.4	(-32.53, 52.36)	38.5
TD3	(-23.00, 34.61)	20.7	(-13.19, 20.21)	24.9	11.08	4.1
	(-57.24, 80.49)	49.3	(-55.25, 73.94)	53.5
	(-36.36, 50.96)	31.2	(-29.20, 45.51)	35.4