Intelligent Decision-making Method in Imbalanced Air Combat Based on Asymmetric Self-play

doi:10.16182/j.issn1004731x.joss.25-0621

Abstract

Abstract:

To solve the problem of strategy convergence caused by role homogenization in traditional self-play for imbalanced air combat, an intelligent decision-making method based on asymmetric self-play was proposed. This method decoupled tactics from control by employing a hierarchical reinforcement learning framework and designed differentiated reward functions for advantaged and disadvantaged sides. Bidirectional independent policy pools were constructed to promote the co-evolution of strategies. The proximal policy optimization algorithm was utilized to train the model. Experiments in 1v1 weapon-imbalanced and 2v1 numerically-imbalanced scenarios demonstrate that compared to symmetric self-play, the proposed method increases the kill rate of the advantaged side by up to 12% and the survival rate of the disadvantaged side by up to 40%. The overall effectiveness in multi-agent combat is also significantly enhanced. The study verifies the effectiveness of the asymmetric design in enhancing the specialized combat capabilities and tactical diversity of intelligent agents for imbalanced air combat.

Key words: imbalanced air combat, asymmetric self-play, bidirectional policy pool, hierarchical reinforcement learning, proximal policy optimization

CLC Number:

TP391.9

Zheng Wei, Tang Jiahao, Xiong Xiaoping, Fan Xin. Intelligent Decision-making Method in Imbalanced Air Combat Based on Asymmetric Self-play[J]. Journal of System Simulation, 2026, 38(2): 433-446.

Figures/Tables 17

Fig. 1

Fig. 2

Fig. 3

Table 1

Fig. 4

Table 2

Table 3

Fig. 5

Table 4

PPO algorithm parameter settings

主要参数	量值
折扣因子	0.99
学习率	$≤ 3 × 10 - 4$
裁剪率	0.2
批训练大小	32
回合步长	1 000
最大步长	$1 × 108$
经验池大小	3 000

Table 4

Table 5

Fig. 6

Fig. 7

Fig. 8

Fig. 9

Fig. 10

Fig. 11

Fig. 12

References 20

[1]	Bi Zheyuan, Chen Hualiang, Hu Jie, et al. Analysis of UAV Typical War Cases and Combat Assessment Research[C]//2022 IEEE International Conference on Unmanned Systems (ICUS). Piscataway: IEEE, 2022: 1449-1453.
[2]	薛健, 赵琳, 向贤财, 等. 非完全信息下无人机集群对抗研究综述[J]. 电子与信息学报, 2024, 46(4): 1157-1172.
	Xue Jian, Zhao Lin, Xiang Xiancai, et al. A Review of the Research on UAV Swarm Confrontation under Incomplete Information[J]. Journal of Electronics & Information Technology, 2024, 46(4): 1157-1172.
[3]	Wu Pengcheng, Wang Hongqiao, Liang Gaowei, et al. Research on Unmanned Aerial Vehicle Cluster Collaborative Countermeasures Based on Dynamic Non-zero-sum Game Under Asymmetric and Uncertain Information[J]. Aerospace, 2023, 10(8): 711.
[4]	Zhang Zhe, Jiang Ju, Xu Haiyan, et al. Distributed Dynamic Task Allocation for Unmanned Aerial Vehicle Swarm Systems: A Networked Evolutionary Game-theoretic Approach[J]. Chinese Journal of Aeronautics, 2024, 37(6): 182-204.
[5]	Liu Jinrong. An Improved Genetic Algorithm for Rapid UAV Path Planning[J]. Journal of Physics: Conference Series, 2022, 2216(1): 012035.
[6]	Hazha Saeed Yahia, Amin Salih Mohammed. Path Planning Optimization in Unmanned Aerial Vehicles Using Meta-heuristic Algorithms: A Systematic Review[J]. Environmental Monitoring and Assessment, 2023, 195(1): 30.
[7]	傅莉, 谢福怀, 孟光磊, 等. 基于滚动时域的无人机空战决策专家系统[J]. 北京航空航天大学学报, 2015, 41(11): 1994-1999.
	Fu Li, Xie Fuhuai, Meng Guanglei, et al. An UAV Air-combat Decision Expert System Based on Receding Horizon Control[J]. Journal of Beijing University of Aeronautics and Astronautics, 2015, 41(11): 1994-1999.
[8]	Frey Matthias A, Attmanspacher Jonas, Schulte Axel. A Dynamic Bayesian Network and Markov Decision Process for Tactical UAV Decision Making in MUM-T Scenarios[C]//2022 IEEE Conference on Cognitive and Computational Aspects of Situation Management (CogSIMA). Piscataway: IEEE, 2022: 47-54.
[9]	江达伟, 董阳阳, 张立东, 等. 基于深度学习的空中目标威胁评估技术研究[J]. 系统仿真学报, 2025, 37(3): 791-802.
	Jiang Dawei, Dong Yangyang, Zhang Lidong, et al. Research on Air Target Threat Assessment Technology Based on Deep Learning[J]. Journal of System Simulation, 2025, 37(3): 791-802.
[10]	Schimpf N, Wang Zhe, Li S, et al. A Generalized Approach to Aircraft Trajectory Prediction via Supervised Deep Learning[J]. IEEE Access, 2023, 11: 116183-116195.
[11]	Yang Qiming, Zhang Jiandong, Shi Guoqing, et al. Maneuver Decision of UAV in Short-Range Air Combat Based on Deep Reinforcement Learning[J]. IEEE Access, 2020, 8: 363-378.
[12]	Zhang Jiandong, Yang Qiming, Shi Guoqing, et al. UAV Cooperative Air Combat Maneuver Decision Based on Multi-agent Reinforcement Learning[J]. Journal of Systems Engineering and Electronics, 2021, 32(6): 1421-1438.
[13]	Silver D, Huang A, Maddison C J, et al. Mastering the Game of Go with Deep Neural Networks and Tree Search[J]. Nature, 2016, 529(7587): 484-489.
[14]	Berner C, Brockman G, Chan B, et al. Dota 2 with Large Scale Deep Reinforcement Learning[EB/OL]. (2019-12-13) [2025-05-21]. .
[15]	Ye Ziyu, Agarwal R, Liu Tianqi, et al. Evolving Alignment via Asymmetric Self-play[EB/OL]. (2024-10-31) [2025-05-21]. .
[16]	Kong Weiren, Zhou Deyun, Zhou Ying, et al. Hierarchical Reinforcement Learning from Competitive Self-play for Dual-aircraft Formation Air Combat[J]. Journal of Computational Design and Engineering, 2023, 10(2): 830-859.
[17]	单圣哲, 杨孟超, 张伟伟, 等. 自主空战连续决策方法[J]. 航空工程进展, 2022, 13(5): 47-58.
	Shan Shengzhe, Yang Mengchao, Zhang Weiwei, et al. Continuous Decision-making Method for Autonomous Air Combat[J]. Advances in Aeronautical Science and Engineering, 2022, 13(5): 47-58.
[18]	单圣哲, 张伟伟. 基于自博弈深度强化学习的空战智能决策方法[J]. 航空学报, 2024, 45(4): 200-212.
	Shan Shengzhe, Zhang Weiwei. Air Combat Intelligent Decision-making Method Based on Self-play and Deep Reinforcement Learning[J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(4): 200-212.
[19]	Jaderberg M, Czarnecki W M, Dunning I, et al. Human-level Performance in 3D Multiplayer Games with Population-based Reinforcement Learning[J]. Science, 2019, 364(6443): 859-865.
[20]	Vinyals O, Babuschkin I, Chung J, et al. Alphastar: Mastering the Real-time Strategy Game Starcraft II[J]. DeepMind Blog, 2019, 2(1): 20.

对抗场景	判定结果	双方存活机体数量
1v1均势	胜利	{(1,0)}
	失败	{(0,1)}
	平局	{(1,1), (0,0)}
1v1优势	胜利	{(1,0)}
1v1优势	失败	{(1,1), (0,1), (0,0)}
1v1劣势	胜利	{(1,1), (1,0), (0,0)}
1v1劣势	失败	{(0,1)}
2v2均势	胜利	{(2,1), (2,0), (1,0)}
	失败	{(1,2), (0,2), (0,1)}
	平局	{(2,2), (1,1), (0,0)}
2v1优势	胜利	{(2,0)}
	失败	{(2,1), (1,0), (0,1)}
	平局	{(1,1), (0,0)}
2v1劣势	胜利	{(1,2), (1,0), (0,1)}
	失败	{(0,2)}
	平局	{(1,1), (0,0)}

参数名称	类型/量值
无人机类型	F-16战机
导弹类型	AIM-9L导弹
最小发射间隔	25 s(125仿真步)
最大攻击角/(°)	45
初始空速/(m/s)	240
初始高度/m	6 000
物理步长	1/60 s(60 Hz)
仿真步长/s	0.2
最大作战时长/s	200 (1 000仿真步)

对抗场景	数量配置/架	武器配置(枚/架)
1v1均势	1	3
1v1优势	1	2
1v1劣势	1	不携带武器
2v2均势	2	2
2v1优势	2	2
2v1劣势	1	3