基于强化学习的最优控制指令模仿生成方法

doi:10.16182/j.issn1004731x.joss.22-0632

摘要/Abstract

摘要：

以高速机动目标拦截为问题背景，基于深度强化学习提出了一种不依赖目标加速度估计的逆轨拦截最优控制指令生成 方法，并通过仿真实验进行了有效性验证。从仿真实验结果看，提出的方法实现了三维空间高速机动目标逆轨拦截并大幅削减了对带有强不确定性目标估计的要求，相比最优控制方法具有更强的适用性。

关键词: 强化学习, 最优制导, 模仿学习, 逆轨拦截, 制导控制

Abstract:

Under the background of high-speed maneuvering target interception,an optimal guidance law generation method for head-on interception independent of target acceleration estimation is proposed based on deep reinforcement learning. In addition, its effectiveness is verified through simulation experiments. As the simulation results suggest, the proposed method successfully achieves head-on interception of high-speed maneuvering targets in 3D space and largely reduces the requirement for target estimation with strong uncertainty, and it is more applicable than the optimal control method.

Key words: reinforcement learning, optimal guidance, imitation learning, head-on interception, guidance and control

中图分类号:

TP391.9

贾政轩,林廷宇,肖莹莹等 . 基于强化学习的最优控制指令模仿生成方法[J]. 系统仿真学报, 2023, 35(11): 2410-2418.

Jia Zhengxuan,Lin Tingyu,Xiao Yingying,et al . Imitative Generation of Optimal Guidance Law Based on Reinforcement Learning[J]. Journal of System Simulation, 2023, 35(11): 2410-2418.

图/表 10

图1

表1

测量变异汇总

变量	含义	变量	含义
$(r x r y r z)$	弹目坐标差在地心坐标系三轴上投影	$ψ m$	弹道偏角
$(v r x v r y v r z)$	弹目速度差在地心坐标系三轴上投影	$θ m x y$	弹道倾角在纵向平面内投影
$q 1$	纵向平面视线角	$(v m x v m y v m z)$	导弹速度在地心坐标系三轴上投影
$q 2$	横向平面视线角	$ρ$	局部大气密度
$θ m$	弹道倾角	$v s$	局部声速

表1

图2

图3

表2

制导律性能对比

方法

末时刻

距离/m

脱靶

量/m

交会角

偏差1( $°$ )

交会角

偏差2( $°$ )

最优制导律方法^[8]

11.210 5

0.582 8

-1.438 4

0.128 8

本文方法

3.424 4

3.237 8

0.894 7

-1.417 7

表2

图4

图5

图6

图7

图8

参考文献 27

1	吴帅, 周晓华, 汪莉莉, 等. 基于实际采样的导弹弹道建模与仿真[J]. 系统仿真学报, 2019, 31(4): 811-817.
	Wu Shuai, Zhou Xiaohua, Wang Lili, et al. Modeling and Simulation of Missile Trajectory Based on Practical Sampling[J]. Journal of System Simulation, 2019, 31(4): 811-817.
2	顾文锦, 雷军委, 潘长鹏. 带落角限制的虚拟目标比例导引律设计[J]. 飞行力学, 2006, 24(2): 43-46.
	Gu Wenjin, Lei Junwei, Pan Changpeng. Design of the Climbing Trajectory Using Virtual Target's Proportional Navigation Method with the Control of Terminal Azimuth of a Missile[J]. Flight Dynamics, 2006, 24(2): 43-46.
3	Lee C H, Kim T H, Tahk M J. Interception Angle Control Guidance Using Proportional Navigation with Error Feedback[J]. Journal of Guidance, Control, and Dynamics, 2013, 36(5): 1556-1561.
4	闫梁, 赵继广, 李辕. 带约束碰撞角的顺/逆轨制导律设计[J]. 北京航空航天大学学报, 2015, 41(5): 857-863.
	Yan Liang, Zhao Jiguang, Li Yuan. Guidance Law with Angular Constraints for Head-pursuit or Head-on Engagement[J]. Journal of Beijing University of Aeronautics and Astronautics, 2015, 41(5): 857-863.
5	Li Yuan, Yan Liang, Zhao Jiguang, et al. Combined Proportional Navigation Law for Interception of High-speed Targets[J]. Defence Technology, 2014, 10(3): 298-303.
6	司玉洁, 熊华, 李喆. 拦截机动目标的三维自适应神经网络制导律[J]. 系统仿真学报, 2021, 33(2): 453-460.
	Si Yujie, Xiong Hua, Li Zhe. Three-dimensional Adaptive Neural Network Guidance Law Against Maneuvering Targets[J]. Journal of System Simulation, 2021, 33(2): 453-460.
7	熊少锋, 魏明英, 赵明元, 等. 考虑导弹速度时变的角度约束最优中制导律[J]. 控制理论与应用, 2018, 35(2): 248-257.
	Xiong Shaofeng, Wei Mingying, Zhao Mingyuan, et al. Impact Angle Constrained Optimal Midcourse Guidance Law for Missiles of Time-varying Speed[J]. Control Theory & Applications, 2018, 35(2): 248-257.
8	熊少锋, 魏明英, 赵明元, 等. 逆轨拦截机动目标的三维最优制导律[J]. 宇航学报, 2020, 41(1): 80-90.
	Xiong Shaofeng, Wei Mingying, Zhao Mingyuan, et al. Three Dimensional Optimal Guidance Law Against Maneuvering Targets for Head-on Engagement[J]. Journal of Astronautics, 2020, 41(1): 80-90.
9	孟克子, 周荻. 多约束条件下的最优中制导律设计[J]. 系统工程与电子技术, 2016, 38(1): 116-122.
	Meng Kezi, Zhou Di. Design of Optimal Midcourse Guidance Law with Multiple Constraints[J]. Systems Engineering and Electronics, 2016, 38(1): 116-122.
10	Taub I, Shima T. Intercept Angle Missile Guidance Under Time Varying Acceleration Bounds[J]. Journal of Guidance, Control, and Dynamics, 2013, 36(3): 686-699.
11	Bai Guoyu, Shen Huairong, Chen Jingpeng, et al. Novel Guidance Law for Interception for Maneuvering Target with High-speed[C]//Proceedings of 2016 3rd International Conference on Engineering Technology and Application. Lancaster, PA, USA: DEStech Publications, 2016: 735-742.
12	周慧波, 宋申民, 刘海坤. 具有攻击角约束的非奇异终端滑模导引律设计[J]. 中国惯性技术学报, 2014, 22(5): 606-611, 618.
	Zhou Huibo, Song Shenmin, Liu Haikun. Nonsingular Terminal Sliding Mode Guidance Law with Impact Angle Constraint[J]. Journal of Chinese Inertial Technology, 2014, 22(5): 606-611, 618.
13	LeCun Y, Bengio Y, Hinton G. Deep Learning[J]. Nature, 2015, 521(7553): 436-444.
14	郭圣明, 贺筱媛, 吴琳, 等. 基于强制稀疏自编码神经网络的作战态势评估方法研究[J]. 系统仿真学报, 2018, 30(3): 772-784, 800.
	Guo Shengming, He Xiaoyuan, Wu Lin, et al. Situation Assessment Approach for Air Defense Operation System Based on Force-sparsed Stacked-auto Encoding Neural Networks[J]. Journal of System Simulation, 2018, 30(3): 772-784, 800.
15	Mnih V, Kavukcuoglu K, Silver D, et al. Human-level Control Through Deep Reinforcement Learning[J]. Nature, 2015, 518(7540): 529-533.
16	Silver D, Huang A, Maddison C J, et al. Mastering the Game of Go with Deep Neural Networks and Tree Search[J]. Nature, 2016, 529(7587): 484-489.
17	Vinyals O, Babuschkin I, Czarnecki W M, et al. Grandmaster Level in StarCraft II Using Multi-agent Reinforcement Learning[J]. Nature, 2019, 575(7782): 350-354.
18	Furfaro R, Linares R. Waypoint-based Generalized ZEM/ZEV Feedback Guidance for Planetary Landing Via a Reinforcement Learning Approach[C]//3rd International Academy of Astronautics Conference on Dynamics and Control of Space Systems. Escondido, CA, USA: Univelt Inc., 2017: 401-416.
19	Liang Chen, Wang Weihong, Liu Zhenghua, et al. Learning to Guide: Guidance Law Based on Deep Meta-learning and Model Predictive Path Integral Control[J]. IEEE Access, 2019, 7: 47353-47365.
20	Gaudet B, Furfaro R. Missile Homing-phase Guidance Law Design Using Reinforcement Learning[C]//AIAA Guidance, Navigation, and Control Conference. Reston, VA, USA: AIAA, 2012: AIAA 2012-4470.
21	Chen Yadong, Wang Jianan, Wang Chunyan, et al. Three-dimensional Cooperative Homing Guidance Law with Field-of-view Constraint[J]. Journal of Guidance, Control, and Dynamics, 2020, 43(2): 389-397.
22	Hussein A, Gaber M M, Elyan E, et al. Imitation Learning: A Survey of Learning Methods[J]. ACM Computing Surveys, 2018, 50(2): 21.
23	Micheal B, Claude S. A Framework for Behavioural Cloning[M]. [S.l.]: [s.n.], 1995: 103-129.
24	Abbeel P, Ng A Y. Apprenticeship Learning Via Inverse Reinforcement Learning[C]//Proceedings of the Twenty-First International Conference on Machine Learning. New York, NY, USA: Association for Computing Machinery, 2004: 1.
25	Ng A Y, Russell S J. Algorithms for Inverse Reinforcement Learning[C]//Proceedings of the Seventeenth International Conference on Machine Learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2000: 663-670.
26	Ross Stéphane, Gordon G J, Bagnell J A. A Reduction of Imitation Learning and Structured Prediction to No-regret Online Learning[C]//Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. Chia Laguna Resort, Sardinia, Italy: PMLR, 2011: 627-635.
27	Hämäläinen Perttu, Babadi A, Ma Xiaoxiao, et al. PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation[C]//2020 IEEE 30th International Workshop on Machine Learning for Signal Processing (MLSP). Piscataway, NJ, USA: IEEE, 2020: 1-6.

[1]	郭润夏, 王一府. 以维修间隔利用率最优为目标的飞机派遣方法[J]. 系统仿真学报, 2023, 35(9): 1985-1999.
[2]	林俊强, 王红军, 邹湘军, 张坡, 李承恩, 周益鹏, 姚书杰. 基于DPPO的移动采摘机器人避障路径规划及仿真[J]. 系统仿真学报, 2023, 35(8): 1692-1704.
[3]	刘家义, 王刚, 付强, 郭相科, 王思远. 基于分配策略优化算法的智能防空任务分配[J]. 系统仿真学报, 2023, 35(8): 1705-1716.
[4]	杨来义, 毕敬, 苑海涛. 基于SAC算法的移动机器人智能路径规划[J]. 系统仿真学报, 2023, 35(8): 1726-1736.
[5]	丁飞, 沙宇晨, 洪莹, 蒯晓, 张登银. 智能网联汽车计算卸载与边缘缓存联合优化策略[J]. 系统仿真学报, 2023, 35(6): 1203-1214.
[6]	戴宇轩, 崔承刚. 基于深度强化学习的Boost变换器控制策略[J]. 系统仿真学报, 2023, 35(5): 1109-1119.
[7]	徐浩添, 秦龙, 曾俊杰, 胡越, 张琪. 基于深度强化学习的对手建模方法研究综述[J]. 系统仿真学报, 2023, 35(4): 671-694.
[8]	石鼎, 燕雪峰, 宫丽娜, 张静宣, 关东海, 魏明强. 强化学习驱动的海战场多智能体协同作战仿真算法[J]. 系统仿真学报, 2023, 35(4): 786-796.
[9]	史佳洁, 杨鹏, 皮雁南. 基于机器学习的地铁行人流在线优化控制研究[J]. 系统仿真学报, 2023, 35(2): 386-395.
[10]	薛乃阳, 丁丹, 贾玉童, 王志强, 刘渊. 基于DQN的异构测控资源联合调度方法[J]. 系统仿真学报, 2023, 35(2): 423-434.
[11]	胡峰, 谷海洋, 林军. 无人机协同车载边缘网络中任务卸载策略[J]. 系统仿真学报, 2023, 35(11): 2373-2384.
[12]	倪静, 马梦珂. 基于深度强化学习的跨单元动态调度方法[J]. 系统仿真学报, 2023, 35(11): 2345-2358.
[13]	王宇琨, 王泽, 董力维, 李妮. 基于分层的智能建模方法的多机空战行为建模[J]. 系统仿真学报, 2023, 35(10): 2249-2261.
[14]	赵也践, 王艳红, 张俊, 于洪霞, 田中大. 改进Q学习算法在作业车间调度问题中的应用[J]. 系统仿真学报, 2022, 34(6): 1247-1258.
[15]	张森, 张孟炎, 邵敬平, 普杰信. 基于随机策略搜索的多机三维路径规划方法[J]. 系统仿真学报, 2022, 34(6): 1286-1295.