系统仿真学报 ›› 2025, Vol. 37 ›› Issue (2): 508-516.doi: 10.16182/j.issn1004731x.joss.23-1219

• 研究论文 • 上一篇    

舰船防空反导的目标分配方法研究

费帅迪1, 蔡长龙1, 刘飞2, 陈明晖3, 刘晓明3   

  1. 1.西安工业大学 兵器科学与技术学院,陕西 西安 710021
    2.陕西启明信息技术有限公司,陕西 西安 710021
    3.中国北方车辆研究所,北京 100072
  • 收稿日期:2023-10-10 修回日期:2023-11-10 出版日期:2025-02-14 发布日期:2025-02-10
  • 通讯作者: 蔡长龙
  • 第一作者简介:费帅迪(2000-),男,硕士生,研究方向为强化学习、智能决策。

Research on the Target Allocation Method for Air Defense and Anti-missile Defense of Naval Ships

Fei Shuaidi1, Cai Changlong1, Liu Fei2, Chen Minghui3, Liu Xiaoming3   

  1. 1.School of Armament Science and Technology, Xi'an Technological University, Xi'an 710021, China
    2.Shaanxi Qiming Information Technology ; Xi'an 710021, China
    3.China North Vehicle Research Institute, Beijing 100072, China
  • Received:2023-10-10 Revised:2023-11-10 Online:2025-02-14 Published:2025-02-10
  • Contact: Cai Changlong

摘要:

为了解决动态武器目标分配问题中遇到的状态信息多类型、时间序列相关的问题,提出一种基于改进的深度强化学习算法的动态武器目标分配方法。构建了目标导弹-拦截单元的多输入分配模型;设计一个多输入的状态空间,并结合问题模型建立马尔可夫决策过程;设计一个结合多输入信息处理和门控循环网络的特征提取网络,提高对状态信息的提取能力,保留所需要的状态信息并遗忘不重要的状态信息;在策略网络中引入多头注意力机制,提高模型的表现能力和收敛速度。实验结果表明:该动态武器目标分配方法有较好的收敛速度和拦截收益。

关键词: 防空反导, 目标分配, 武器目标分配, 深度强化学习, 注意力机制, Advantage Actor-Critic

Abstract:

To solve the problems of multiple types of state information and correlation of time-series state information encountered in the dynamic weapon target assignment problem, a dynamic weapon target assignment method based on an improved deep reinforcement learning algorithm is proposed. A multi-input assignment model of target missile-interceptor unit, interceptor unit, and defense unit under multi-wave target and multi-phase is constructed. A multi-input state space is designed, and a Markov decision process is established in conjunction with the problem model. A feature extraction network combining multi-input information processing and gated recurrent network is designed, which improves the ability to extract state information, retains the necessary state information and forgets the unimportant state information, and the multi-head attention mechanism is introduced into the strategy network to improve the expressiveness and convergence speed of the model. As shown by the experimental results, the dynamic weapon target assignment method in this paper has better convergence speed and interception gain.

Key words: air defense and anti-missile, target assignment, weapons target assignment, DRL, attention mechanism, Advantage Actor-Critic

中图分类号: