Journal of System Simulation ›› 2026, Vol. 38 ›› Issue (2): 447-459.doi: 10.16182/j.issn1004731x.joss.25-0472

• Wargaming and Simulation-Based Evaluation • Previous Articles    

Intelligent Air Combat Decision-making Method Based on BiGRU and Priority Dynamic Sampling

Ding Zhengkun, Liu Jiaqi, Xu Junzheng, Xu Yuezhu, Wang Xingmei   

  1. College of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China
  • Received:2025-05-26 Revised:2025-10-25 Online:2026-02-18 Published:2026-02-11
  • Contact: Xu Yuezhu

Abstract:

Current multi-agent reinforcement learning algorithms suffer from low efficiency in utilizing experience data and difficulties in setting appropriate learning rates. To address these issues, this paper proposed a BiGRU multi-agent PPO with priority sampling and dynamic learning rate. The algorithm incorporated a BiGRU network to enhance the policy network's ability to model temporal information. A priority partial sampling mechanism was introduced to improve the utilization efficiency of high-value experience data. Additionally, an improved Adam optimizer with dynamic learning rate adjustment was employed to address the challenge of learning rate configuration. Simulation experiment results demonstrate that the algorithm significantly enhances convergence speed, stability, and combat win rate, offering a novel optimization scheme for multi-agent air combat decision-making.

Key words: air combat policy optimization, priority partial sampling, dynamic learning rate, deep reinforcement learning

CLC Number: