Research on Autonomous Decision-making in Air-combat Based on Improved Proximal Policy Optimization

doi:10.16182/j.issn1004731x.joss.23-0584

Abstract

Abstract:

To address the problems of high information redundancy and slow convergence speed of traditional reinforcement learning in air-combat autonomous decision-making applications, a proximal policy optimization air-combat autonomous decision-making method, based on dual observation and composite reward is proposed. A dual observation space, which contains interaction information as the main information and individual feature information as a supplement, was designed to reduce the influence of redundant battlefield information on the training efficiency of the decision model. A composite reward function combining result reward and process reward was designed to improve convergence speed. The generalized advantage estimator was applied in the proximal policy optimization strategy algorithm to improve the accuracy of advantage function estimation. Simulation results show that the method decision-making model can make precise autonomous decisions and complete air-combat tasks according to the battlefield situation in two types of experimental scenarios: against fixed-programmed and matrix gaming opponents.

Key words: RL, air-combat autonomous decision-making, dual observation, composite reward, generalized advantage estimator

CLC Number:

TP391.9

Qian Dianwei, Qi Hongmin, Liu Zhen, Zhou Zhiming, Yi Jianqiang. Research on Autonomous Decision-making in Air-combat Based on Improved Proximal Policy Optimization[J]. Journal of System Simulation, 2024, 36(9): 2208-2218.

Figures/Tables 16

Fig. 1

Fig. 2

Table 1

Judgement of conditions for a game

结束条件	对局结果
$Δ d < 3 000 m, α < 30 °, β > 150 °$	红机获胜
$z B > 0$	红机获胜
$Δ d < 3 000 m, α > 150 °, β < 30 °$	蓝机获胜
$z R > 0$	蓝机获胜
$Δ d < 3000 m, α < 30 °, β < 30 °$	双方平局
其他情况	对局继续

Table 1

Fig. 3

Fig. 4

Table 2

Hyperparameter setting of policy network training

参数名	参数值
batch-size	256
折扣因子 $η$	0.99
学习率	0.000 3
GAE- $λ$	0.99
裁剪系数 $ε$	0.2
网络隐藏层的维度	256
不可逃逸区角度 $α s h o t / (°)$	30
最大逃逸角度 $α m a x / (°)$	150
经验缓存区的容量	2 048

Table 2

Fig. 5

Fig. 6

Fig. 7

Fig. 8

Fig. 9

Table 3

Fig. 10

Fig. 11

Fig. 12

Table 4

References 16

1	孙智孝, 杨晟琦, 朴海音, 等. 未来智能空战发展综述[J]. 航空学报, 2021, 42(8): 28-42.
	Sun Zhixiao, Yang Shengqi, Haiyin Piao, et al. A Survey of Air Combat Artificial Intelligence[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42(8): 28-42.
2	Mitchell R R. Embedding a Tactics Expert System into Air Combat Simulation Software[C]//Proceedings of the IEEE National Aerospace and Electronics Conference. Piscataway, NJ, USA: IEEE, 1989: 1027-1033.
3	Li Qiuni, Wang Fawei, Yang Wanping, et al. Air Combat Maneuver Strategy Algorithm Based on Two-layer Game Decision-making and Distributed Double Game Trees MCTS Under Uncertain Information[J]. Electronics, 2022, 11(16): 2608.
4	Ernest N, Carroll D, Schumacher C, et al. Genetic Fuzzy Based Artificial Intelligence for Unmanned Combat Aerial Vehicle Control in Simulated Air Combat Missions[J]. Journal of Defense Management, 2016, 6(1): 1000144.
5	Hu Dongyuan, Yang Rennong, Zuo Jialiang, et al. Application of Deep Reinforcement Learning in Maneuver Planning of Beyond-visual-range Air Combat[J]. IEEE Access, 2021, 9: 32282-32297.
6	Wang Xu, Wang Sen, Liang Xingxing, et al. Deep Reinforcement Learning: A Survey[J/OL]. IEEE Transactions on Neural Networks and Learning Systems. (2022-09-28) [2022-11-18]. .
7	黄晓冬, 苑海涛, 毕敬, 等. 基于DQN的海战场舰船路径规划及仿真[J]. 系统仿真学报, 2021, 33(10): 2440-2448.
	Huang Xiaodong, Yuan Haitao, Bi Jing, et al. DQN-based Path Planning Method and Simulation for Submarine and Warship in Naval Battlefield[J]. Journal of System Simulation, 2021, 33(10): 2440-2448.
8	曾贲, 房霄, 孔德帅, 等. 一种数据驱动的对抗博弈智能体建模方法[J]. 系统仿真学报, 2021, 33(12): 2838-2845
	Zeng Ben, Fang Xiao, Kong Deshuai, et al. A Data-driven Modeling Method for Game Adversity Agent[J]. Journal of System Simulation, 2021, 33(12): 2838-2845.
9	Nam Tran Duc, Quan Tran Hai, Dat Nguyen Van, et al. An Approach for UAV Indoor Obstacle Avoidance Based on AI Technique with Ensemble of ResNet8 and Res-DQN[C]//2019 6th NAFOSTED Conference on Information and Computer Science (NICS). Piscataway, NJ, USA: IEEE, 2019: 330-335.
10	李永丰, 史静平, 章卫国, 等. 深度强化学习的无人作战飞机空战机动决策[J]. 哈尔滨工业大学学报, 2021, 53(12): 33-41.
	Li Yongfeng, Shi Jingping, Zhang Weiguo, et al. Maneuver Decision of UCAV in Air Combat Based on Deep Reinforcement Learning[J]. Journal of Harbin Institute of Technology, 2021, 53(12): 33-41.
11	王昱, 任田君, 范子琳. 基于引导Minimax-DDQN的无人机空战机动决策[J]. 计算机应用, 2023, 43(8): 2636-2643.
	Wang Yu, Ren Tianjun, Fan Zilin. Air Combat Maneuver Decision-making of Unmanned Aerial Vehicle Based on Guided Minimax-DDQN[J]. Journal of Computer Applications, 2023, 43(8): 2636-2643.
12	Hu Dongyuan, Zuo Jialiang, Zhang Wanze, et al. Research on Application of LSTM-QDN in Intelligent Air Combat Simulation[J]. Journal of Physics: Conference Series, 2021, 1746(1): 012028.
13	Jing Xianyong, Hou Manyi, Wu Gaolong, et al. Research on Maneuvering Decision Algorithm Based on Improved Deep Deterministic Policy Gradient[J]. IEEE Access, 2022, 10: 92426-92445.
14	Pope A P, Ide J S, Mićović Daria, et al. Hierarchical Reinforcement Learning for Air Combat at DARPA's AlphaDogfight Trials[J]. IEEE Transactions on Artificial Intelligence, 2023, 4(6): 1371-1385.
15	Schulman J, Wolski F, Dhariwal P, et al. Proximal Policy Optimization Algorithms[EB/OL]. (2017-08-28) [2022-11-27]. .
16	康扬名. 多无人机协同对抗系统智能决策与控制研究[D]. 北京: 中国科学院大学, 2020.
	Kang Yangming. Research on Intelligent Decision and Control of Multi-UAV Cooperative Countermeasure System[D]. Beijing: University of Chinese Academy of Sciences, 2020.

空战对手模型	红机胜率	蓝机胜率	平局概率
平飞运动	89.9	3.0	7.1
水平盘旋运动	88.5	3.2	8.3

[1]	Qin Long, Huang Hesong, Yin Lujia, Ai Chuan, Zhang Qi, Li Xinmeng. Intelligent Competition Platform and Mode Driven by Cloud-native Simulation [J]. Journal of System Simulation, 2026, 38(4): 988-1003.
[2]	Wu Shuxia, Zhang Junjie, Chen Delong, Chen Zheyi. Resource-efficient Continuous Learning Framework for Edge Real-time Video Analytics [J]. Journal of System Simulation, 2026, 38(2): 294-306.
[3]	Zhang Ziyao, Ji Yunfeng. Simulation of Robotic Arm Ball-catching Strategy Based on Curriculum RL of Transformer [J]. Journal of System Simulation, 2026, 38(2): 321-331.
[4]	Wang Bingkun, Wang Yue, Yang Mei, Zhang Pengnian, Fan Bohao, Tang Jie. Strike Strategy Planning Method of Unmanned Ground Vehicles Based on Improved PPO Algorithm [J]. Journal of System Simulation, 2026, 38(2): 372-386.
[5]	Liu Quan, Wang Yu, Liu Linyue, Chen Hao, Huang Jian. Knowledge Closed-loop Driving-based Intelligent Game Confrontation Simulation [J]. Journal of System Simulation, 2026, 38(2): 416-432.
[6]	Xu Risheng, Yang Linyao, Qin Yuanqi, Wang Xiao, Sun Changyin. Knowledge-enhanced LLM-based Method for Regional Traffic Signal Control [J]. Journal of System Simulation, 2026, 38(2): 518-531.
[7]	Zhu Yuning, Yang Meng, Chen Tianyue, Meng Weiliang. VRBT: VR Badminton Training with Multitask Injury Alerts based on Lightweight 3D Skeletal Reconstruction [J]. Journal of System Simulation, 2026, 38(1): 225-234.
[8]	Zhang Wei, Sheng Wei, Cao Yidan, Zhao Tingsheng. Research on 3D Visualization of Safety Monitoring and Early Warning for Steel Continuous Casting Scenarios [J]. Journal of System Simulation, 2025, 37(8): 1991-2003.
[9]	Xie Yong, Gao Hailong, Chen Yutao, Wang Huanjiang. Optimization of Product Oil Distribution with Multiple Trips and Multiple Due Dates under Dynamic Demand [J]. Journal of System Simulation, 2025, 37(8): 2016-2029.
[10]	Wang Ziyi, Zhang Kai, Qian Dianwei, Liu Yuzhen. A DRL⁃based Approach for Distributed Equipment Nodes Selection [J]. Journal of System Simulation, 2025, 37(6): 1565-1573.
[11]	Zhang Sen, Dai Qiangqiang. UAV Path Planning Based on Improved Deep Deterministic Policy Gradients [J]. Journal of System Simulation, 2025, 37(4): 875-881.
[12]	Li Min, Zhang Sen, Zeng Xiangguang, Wang Gang, Zhang Tongwei, Xie Dijie, Ren Wenzhe, Zhang Tao. Trajectory Planning of Quadruped Robot Over Obstacle with Single Leg Based on Deep Reinforcement Learning [J]. Journal of System Simulation, 2025, 37(4): 895-909.
[13]	Wang He, Xu Jianing, Yan Guangyu. Research on Pedestrian Avoidance Strategy for AGV Based on Deep Reinforcement Learning [J]. Journal of System Simulation, 2025, 37(3): 595-606.
[14]	Zhang Bin, Lei Yonglin, Li Qun, Gao Yuan, Chen Yong, Zhu Jiajun, Bao Chenlong. Reinforcement Learning Modeling of Missile Penetration Decision Based on Combat Simulation [J]. Journal of System Simulation, 2025, 37(3): 763-774.
[15]	Huang Sijin, Wen Jia, Chen Zheyi. Intelligent Service Migration towards MEC-based IoV Systems [J]. Journal of System Simulation, 2025, 37(2): 379-391.