基于改进TD3的AUV最短时间占位控制策略研究

doi:10.16182/j.issn1004731x.joss.25-0682

摘要/Abstract

摘要：

针对现有占位模型未能充分考虑水下时变洋流干扰和任务时间约束，且AUV缺乏实时运动控制问题，提出了一种基于分位数回归的分布式TD3最短时间占位方法。采用贝叶斯推断方法辨识水动力参数，建立AUV的运动学和动力学模型；构建最短时间占位方程，并对占位目标点和占位时间进行求解；引入一阶高斯-马尔可夫过程模拟时变洋流环境，并基于分布式TD3算法完成了不同强度洋流场景中的AUV占位控制策略训练。仿真结果表明：该方法在动态洋流干扰下具备良好的鲁棒性与适应性，特别在洋流强度较高时，相比TD3基线算法，策略收敛速度提升了30%，AUV占位准确度和成功率分别提高了63%和20%。

关键词: 自主式水下潜航器, 水下占位, TD3算法, 贝叶斯推断, 最短占位时间

Abstract:

Existing occupancy models fail to fully consider the interference of underwater time-varying ocean currents and task time constraints, and AUVs lacks real-time motion control. To address these issues, a shortest time occupancy method based on quantile regression and distributed TD3 was proposed. The Bayesian inference method was used to identify hydrodynamic parameters, and the kinematic and dynamic models of AUVs were established; the shortest time occupancy equation was constructed, and the occupancy target point and occupancy time were solved; a first-order Gauss-Markov process was introduced to simulate the time-varying ocean current environment, and the training of control strategy for AUV occupancy in ocean current scenarios with different intensities was completed based on the distributed TD3 algorithm. The simulation results indicate that this method exhibits good robustness and adaptability under dynamic ocean current interference. Especially when the ocean current intensity is high, compared with the TD3 baseline algorithm, the strategy convergence rate is improved by 30%, and the accuracy and success rate of AUV occupancy are improved by 63% and 20%, respectively.

Key words: autonomous underwater vehicle, underwater occupancy, TD3 algorithm, Bayesian inference, shortest occupancy time

中图分类号:

TP242.6

任文哲,李敏,曾祥光等 . 基于改进TD3的AUV最短时间占位控制策略研究[J]. 系统仿真学报, 2026, 38(6): 1684-1698.

Ren Wenzhe,Li Min,Zeng Xiangguang,et al . Research on Control Strategy for Shortest Time Occupancy of AUV Based on Improved TD3[J]. Journal of System Simulation, 2026, 38(6): 1684-1698.

图/表 24

图1

图2

图3

图4

表1

动力学参数范围及分布

参数	物理意义	一般范围	先验分布
$X u$	纵向线性阻尼系数	(-10, 10)	正态分布 $N ∼ (0,3.33)$
$X u u$	纵向二次阻尼系数	(5, 20)	半正态分布 $H N ∼ (2.5)$
$Y v$	侧向线性阻尼系数	(0, 50)	半正态分布 $H N ∼ (8.33)$
$Y v v$	侧向二次阻尼系数	(100, 3 000)	半正态分布 $H N ∼ (483.33)$
$N r$	艏摇线性阻尼系数	(0, 20)	半正态分布 $H N ∼ (3.33)$
$N r r$	艏摇二次阻尼系数	(10, 300)	半正态分布 $H N ∼ (48.33)$
$Y r$	耦合阻尼系数	(0, 20)	半正态分布 $H N ∼ (3.33)$
$N v$	耦合阻尼系数	(0, 100)	半正态分布 $H N ∼ (16.66)$
$X u'$	纵向附加质量	(-10, -0.5)	负半正态分布 $H N ∼ (1.58)$
$Y v'$	侧向附加质量	(-100, -10)	负半正态分布 $H N ∼ (15)$
$N r'$	横摇附加惯性矩	(-20, -1)	负半正态分布 $H N ∼ (3.16)$
$Y r'$	耦合附加质量	(-10, -1)	负半正态分布 $H N ∼ (1.5)$

表1

表2

动力学参数后验预测值

参数	后验预测值	参数	后验预测值
$X u$	0.169	$Y r$	5.017
$X u u$	8.109	$N v$	36.099
$Y v$	9.766	$X u'$	-1.399
$Y v v$	200.666	$Y v'$	-38.537
$N r$	5.007	$N r'$	-8.877
$N r r$	15.058	$Y r'$	-2.538

表2

表3

动力学参数R2值

输入	$R 2 (u)$	$R 2 (v)$	$R 2 (r)$	平均值
恒定	0.999 9	0.999 9	0.999 5	0.999 8
正弦	0.999 9	0.999 8	0.999 9	0.999 9
方波	0.999 9	0.999 5	0.999 6	0.999 7

表3

图5

图6

图7

表4

训练超参数

参数名称	参数值
策略网络学习率	1×10^-4
价值网络学习率	3×10^-4
折扣因子 $γ$	0.99
软更新系数 $ς$	5×10^-3
经验池容量	1×10⁵
每次训练样本数	512
训练最大回合数	4 000
每回合最大步数	460
神经元个数	256
分位数个数	16
Huber损失阈值 $κ$	1

表4

表5

洋流模型参数值

参数名称	参数值
洋流变化速率 $β$	0.01
洋流随机扰动幅度 $σ w$	5×10^-3
洋流变化间隔/s	5
弱洋流最大速度 $v m i n$ /(m/s)	0.1
强洋流最大速度 $v m a x$ /(m/s)	1.0
仿真步长	0.1
最大仿真时间/s	46
AUV最大推进力/N	34
AUV最大转矩/N	10

表5

表6

奖励函数参数设置

参数名称	参数值
距离奖励系数 $ω 1$	2
航向角度奖励系数 $ω 2$	8
速度奖励系数 $ω 3$	10
偏航惩罚系数 $ω 4 、 ω 5$	10
完成奖励 $σ 1$	500
距离进展奖励系数 $ω 6$	100
距离误差d/m	3

表6

图8

图9

图10

图11

表7

图12

图13

表8

图14

图15

表9

参考文献 22

[1]	潘云伟, 李敏, 曾祥光, 等. 基于人工势场和改进强化学习的自主式水下潜航器避障和航迹规划[J]. 兵工学报, 2025, 46(4): 70-81.
	Pan Yunwei, Li Min, Zeng Xiangguang, et al. AUV Obstacle Avoidance and Path Planning Based on Artificial Potential Field and Improved Reinforcement Learning[J]. Acta Armamentarii, 2025, 46(4): 70-81.
[2]	任勇, 王景璟, 杜军, 等. 自主潜航器关键技术及应用[M]. 北京: 人民邮电出版社, 2021.
	Ren Yong, Wang Jingjing, Du Jun, et al. Key Technologies and Applications of Autonomous Underwater Vehicles[M]. Beijing: Posts & Telecom Press, 2021.
[3]	童心赤. 水下航行器攻击占位研究[D]. 武汉: 武汉理工大学, 2021.
	Tong Xinchi. Research on Occupying Attack Position of Unmanned Underwater Vehicle[D]. Wuhan: Wuhan University of Technology, 2021.
[4]	夏佩伦. 潜艇鱼雷攻击占位机动方案的确定与分析[J]. 火力与指挥控制, 2013, 38(11): 114-117.
	Xia Peilun. Determination and Analysis of Getting-to-the-firing-position Maneuver Scheme for Submarine Attacking with Torpedo[J]. Fire Control & Command Control, 2013, 38(11): 114-117.
[5]	温洪, 魏石川, 陈志鹏, 等. 鱼雷攻击占位相关参数计算[J]. 指挥控制与仿真, 2008, 30(3): 58-60.
	Wen Hong, Wei Shichuan, Chen Zhipeng, et al. Calculation on Relative Parameter of Favored Submarine Position for Torpedo Attack[J]. Command Control & Simulation, 2008, 30(3): 58-60.
[6]	吴志泉, 吴自飞, 李世雄. 基于局部放大法的舰船占领阵位求解方法[J]. 船舶工程, 2023, 45(增1): 389-392.
	Wu Zhiquan, Wu Zifei, Li Shixiong. Solution Method of Occupying Ship Position Based on Partial Enlargement[J]. Ship Engineering, 2023, 45(S1): 389-392.
[7]	王钊, 王宏健, 张宏瀚, 等. UUV攻防博弈的自适应攻击占位机动决策研究[J]. 控制与决策, 2024, 39(11): 3819-3828.
	Wang Zhao, Wang Hongjian, Zhang Honghan, et al. Adaptive Attack Occupancy Maneuver Decision of UUV Attack-defense Game[J]. Control and Decision, 2024, 39(11): 3819-3828.
[8]	宋保维, 姜军, 王鹏, 等. 基于Markov过程的潜艇占位能力模型研究[J]. 鱼雷技术, 2007, 15(4): 45-48.
	Song Baowei, Jiang Jun, Wang Peng, et al. Modelling Taking-up Position Capability of Submarine Based on Markov Chain[J]. Torpedo Technology, 2007, 15(4): 45-48.
[9]	Tong Xinchi, Zhang Huajun, Guo Hang. Research on Occupancy Maneuvering Scheme of Unmanned Underwater Vehicle[C]//2020 Chinese Control and Decision Conference (CCDC). Piscataway: IEEE, 2020: 3478-3483.
[10]	Safari Farhad, Rafeeyan Mansour, Danesh Mohammad. Estimation of Hydrodynamic Coefficients and Simplification of the Depth Model of an AUV Using CFD and Sensitivity Analysis[J]. Ocean Engineering, 2022, 263: 112369.
[11]	高婷, 庞永杰, 王亚兴, 等. 水下航行器水动力系数计算方法[J]. 哈尔滨工程大学学报, 2019, 40(1): 174-180.
	Gao Ting, Pang Yongjie, Wang Yaxing, et al. Calculation Method of Hydrodynamic Coefficients for Underwater Vehicles[J]. Journal of Harbin Engineering University, 2019, 40(1): 174-180.
[12]	Wang Xu, Wang Sen, Liang Xingxing, et al. Deep Reinforcement Learning: A Survey[J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(4): 5064-5078.
[13]	Tai Lei, Paolo Giuseppe, Liu Ming. Virtual-to-real Deep Reinforcement Learning: Continuous Control of Mobile Robots for Mapless Navigation[C]//2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Piscataway: IEEE, 2017: 31-36.
[14]	Zhelo Oleksii, Zhang Jingwei, Tai Lei, et al. Curiosity-driven Exploration for Mapless Navigation with Deep Reinforcement Learning[EB/OL]. (2018-05-14) [2025-07-06]. .
[15]	Fujimoto Scott, Hoof Herke, Meger David. Addressing Function Approximation Error in Actor-critic Methods[C]//Proceedings of the 35th International Conference on Machine Learning. Chia Laguna Resort: PMLR, 2018: 1587-1596.
[16]	Dabney W, Ostrovski G, Silver D, et al. Implicit Quantile Networks for Distributional Reinforcement Learning[C]//Proceedings of the 35th International Conference on Machine Learning. Chia Laguna Resort: PMLR, 2018: 1096-1105.
[17]	Bellemare M G, Dabney W, Munos Rémi. A Distributional Perspective on Reinforcement Learning[C]//Proceedings of the 34th International Conference on Machine Learning. Chia Laguna Resort: PMLR, 2017: 449-458.
[18]	Dabney W, Rowland M, Bellemare M G, et al. Distributional Reinforcement Learning with Quantile Regression[C]//Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence. Palo Alto: AAAI Press, 2018: 2892-2901.
[19]	Mavrin Borislav, Yao Hengshuai, Kong Linglong, et al. Distributional Reinforcement Learning for Efficient Exploration[C]//Proceedings of the 36th International Conference on Machine Learning. Chia Laguna Resort: PMLR, 2019: 4424-4434.
[20]	刘开周, 赵洋. 水下机器人建模与仿真技术[M]. 北京: 科学出版社, 2020.
	Liu Kaizhou, Zhao Yang. Modeling and Simulation Technology for Underwater Vehicle[M]. Beijing: Science Press, 2020.
[21]	Ahmed Faheem, Xiang Xianbo, Zhou Guangzhao, et al. Dynamic Modeling and Maneuvering of Remus 100 AUV: The Impact of Added Mass Coefficients[C]//2023 42nd Chinese Control Conference (CCC). Piscataway: IEEE, 2023: 1424-1429.
[22]	Hong Lin, Fang Renjie, Cai Xiaotian, et al. Numerical Investigation on Hydrodynamic Performance of a Portable AUV[J]. Journal of Marine Science and Engineering, 2021, 9(8): 812.

指标		DTD3	TD3
弱洋流	时间/s	44.4	45.0
	坐标/m	(-44.50, 72.13)	(-44.96, 71.81)
	时间误差/s	0.61	1.21
	坐标误差/m	2.83	2.91
强洋流	时间/s	44.1	46.0
	坐标/m	(-45.24, 71.80)	(-41.51, 68.17)
	时间误差/s	0.31	2.21
	坐标误差/m	2.83	7.79

算法	占位坐标/m	占位时间/s	实际坐标/m	实际时间/s	平均坐标误差/m	平均时间误差/s
DTD3	(-41.71, 55.11)	34.6	(-39.77, 53.07)	35.3	2.86	1.2
	(-63.55, 121.38)	68.5	(-61.98, 119.03)	70.5
	(-32.49, 74.58)	40.7	(-30.47, 72.47)	41.9
TD3	(-69.91, 101.51)	61.6	(-68.41, 98.92)	62.6	3.00	1.7
	(-36.52, 68.18)	38.6	(-35.42, 65.48)	40.3
	(-50.92, 102.99)	57.4	(-52.30, 100.21)	59.6

算法	占位坐标/m	占位时间/s	实际坐标/m	实际时间/s	平均坐标误差/m	平均时间误差/s
DTD3	(-21.02, 28.35)	17.6	(-16.23, 27.27)	20.8	5.62	3.6
	(-49.37, 64.14)	40.5	(-44.96, 62.07)	44.3
	(-36.33, 58.32)	34.4	(-32.53, 52.36)	38.5
TD3	(-23.00, 34.61)	20.7	(-13.19, 20.21)	24.9	11.08	4.1
	(-57.24, 80.49)	49.3	(-55.25, 73.94)	53.5
	(-36.36, 50.96)	31.2	(-29.20, 45.51)	35.4