系统仿真学报 ›› 2026, Vol. 38 ›› Issue (2): 433-446.doi: 10.16182/j.issn1004731x.joss.25-0621

• 博弈与推演评估 • 上一篇    

基于非对称自博弈的非均势空战智能决策方法

郑巍1, 汤佳豪1, 熊小平2, 樊鑫1   

  1. 1.南昌航空大学 软件学院,江西 南昌 330063
    2.中国民用航空江西航空器适航审定中心,江西 南昌 330038
  • 收稿日期:2025-06-30 修回日期:2025-09-19 出版日期:2026-02-18 发布日期:2026-02-11
  • 通讯作者: 汤佳豪
  • 第一作者简介:郑巍(1982-),男,教授,硕导,博士,研究方向为人工智能、大数据分析及空战智能决策。
  • 基金资助:
    国家自然科学基金(62467005)

Intelligent Decision-making Method in Imbalanced Air Combat Based on Asymmetric Self-play

Zheng Wei1, Tang Jiahao1, Xiong Xiaoping2, Fan Xin1   

  1. 1.School of Software Engineering, Nanchang Hangkong University, Nanchang 330063, China
    2.Civil Aviation Administration of China Jiangxi Aircraft Airworthiness Certification Center, Nanchang 330038, China
  • Received:2025-06-30 Revised:2025-09-19 Online:2026-02-18 Published:2026-02-11
  • Contact: Tang Jiahao

摘要:

为解决传统自博弈在非均势空战中因角色同质化导致策略趋同的难题,提出一种基于非对称自博弈的智能决策方法。通过采用分层强化学习框架解耦战术与控制为优劣势方设计差异化奖励函数;构建双向独立策略池以促进策略的协同进化;利用近端策略优化算法完成模型训练。在1v1武器失衡与2v1数量失衡场景下的实验表明:相较于对称自博弈,优势方击杀率最高提升12%,劣势方生存率最高提升40%,在多机对抗中整体效能亦获增强,验证了非对称设计在提升非均势空战决策智能体专项对抗能力与战术多样性方面的有效性。

关键词: 非均势空战, 非对称自博弈, 双向策略池, 分层强化学习, 近端策略优化

Abstract:

To solve the problem of strategy convergence caused by role homogenization in traditional self-play for imbalanced air combat, an intelligent decision-making method based on asymmetric self-play was proposed. This method decoupled tactics from control by employing a hierarchical reinforcement learning framework and designed differentiated reward functions for advantaged and disadvantaged sides. Bidirectional independent policy pools were constructed to promote the co-evolution of strategies. The proximal policy optimization algorithm was utilized to train the model. Experiments in 1v1 weapon-imbalanced and 2v1 numerically-imbalanced scenarios demonstrate that compared to symmetric self-play, the proposed method increases the kill rate of the advantaged side by up to 12% and the survival rate of the disadvantaged side by up to 40%. The overall effectiveness in multi-agent combat is also significantly enhanced. The study verifies the effectiveness of the asymmetric design in enhancing the specialized combat capabilities and tactical diversity of intelligent agents for imbalanced air combat.

Key words: imbalanced air combat, asymmetric self-play, bidirectional policy pool, hierarchical reinforcement learning, proximal policy optimization

中图分类号: