Journal of System Simulation ›› 2024, Vol. 36 ›› Issue (9): 2208-2218.doi: 10.16182/j.issn1004731x.joss.23-0584

Previous Articles    

Research on Autonomous Decision-making in Air-combat Based on Improved Proximal Policy Optimization

Qian Dianwei1, Qi Hongmin1, Liu Zhen2, Zhou Zhiming2, Yi Jianqiang2   

  1. 1.School of Control and Computer Engineering, North China Electric Power University, Beijing 102206, China
    2.Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
  • Received:2023-05-18 Revised:2023-06-16 Online:2024-09-15 Published:2024-09-30
  • Contact: Zhou Zhiming

Abstract:

To address the problems of high information redundancy and slow convergence speed of traditional reinforcement learning in air-combat autonomous decision-making applications, a proximal policy optimization air-combat autonomous decision-making method, based on dual observation and composite reward is proposed. A dual observation space, which contains interaction information as the main information and individual feature information as a supplement, was designed to reduce the influence of redundant battlefield information on the training efficiency of the decision model. A composite reward function combining result reward and process reward was designed to improve convergence speed. The generalized advantage estimator was applied in the proximal policy optimization strategy algorithm to improve the accuracy of advantage function estimation. Simulation results show that the method decision-making model can make precise autonomous decisions and complete air-combat tasks according to the battlefield situation in two types of experimental scenarios: against fixed-programmed and matrix gaming opponents.

Key words: RL, air-combat autonomous decision-making, dual observation, composite reward, generalized advantage estimator

CLC Number: