系统仿真学报 ›› 2026, Vol. 38 ›› Issue (2): 447-459.doi: 10.16182/j.issn1004731x.joss.25-0472

• 博弈与推演评估 • 上一篇    

基于BiGRU与优先级动态采样的智能空战决策方法

丁拯坤, 刘佳奇, 徐军政, 徐悦竹, 王兴梅   

  1. 哈尔滨工程大学 计算机科学与技术学院,黑龙江 哈尔滨 150001
  • 收稿日期:2025-05-26 修回日期:2025-10-25 出版日期:2026-02-18 发布日期:2026-02-11
  • 通讯作者: 徐悦竹
  • 第一作者简介:丁拯坤(2000-),男,博士生,研究方向为强化学习、智能体博弈对抗等。
  • 基金资助:
    中央高校基本科研业务费(3072024XX0602)

Intelligent Air Combat Decision-making Method Based on BiGRU and Priority Dynamic Sampling

Ding Zhengkun, Liu Jiaqi, Xu Junzheng, Xu Yuezhu, Wang Xingmei   

  1. College of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China
  • Received:2025-05-26 Revised:2025-10-25 Online:2026-02-18 Published:2026-02-11
  • Contact: Xu Yuezhu

摘要:

为解决多智能体强化学习算法存在经验数据利用效率低、学习率难以设置的问题,提出一种基于优先级采样和动态学习率的BiGRU多智能体近端策略优化算法。引入BiGRU网络,增强了策略网络对时序信息的建模能力;引入优先级部分采样机制,提高了对高价值经验数据的利用效率;采用改进Adam优化器,动态调整学习率,解决了学习率难以设置的问题。仿真实验结果表明:该算法在收敛速度、稳定性和作战胜率方面均有提高,为多智能体空战决策提供了新的优化方案。

关键词: 空战策略优化, 优先级部分采样, 动态学习率, 深度强化学习

Abstract:

Current multi-agent reinforcement learning algorithms suffer from low efficiency in utilizing experience data and difficulties in setting appropriate learning rates. To address these issues, this paper proposed a BiGRU multi-agent PPO with priority sampling and dynamic learning rate. The algorithm incorporated a BiGRU network to enhance the policy network's ability to model temporal information. A priority partial sampling mechanism was introduced to improve the utilization efficiency of high-value experience data. Additionally, an improved Adam optimizer with dynamic learning rate adjustment was employed to address the challenge of learning rate configuration. Simulation experiment results demonstrate that the algorithm significantly enhances convergence speed, stability, and combat win rate, offering a novel optimization scheme for multi-agent air combat decision-making.

Key words: air combat policy optimization, priority partial sampling, dynamic learning rate, deep reinforcement learning

中图分类号: