基于分层的智能建模方法的多机空战行为建模

doi:10.16182/j.issn1004731x.joss.23-FZ0824

摘要/Abstract

摘要：

针对多机空战对抗场景中高维状态-行为空间约束下兵力博弈决策困难的问题，采用基于深度强化学习的兵力智能体决策生成策略，提出面向兵力智能博弈的态势认知和奖励回报生成算法，构建基于混合的智能建模方法的行为建模分层框架。解决了强化学习过程中存在的稀疏奖励技术难点，为解决大规模、多机型、要素多的空战问题提供一种可行的强化学习训练方法。

关键词: 作战仿真, 多智能体, 深度强化学习, 非稀疏奖励函数

Abstract:

In response to the problem of the difficulty of decision-making in the game of force under the constraints of high-dimensional state-space in multi-machine air combat confrontation scenarios, a force intelligent agent decision-making generation strategy based on deep reinforcement learning is adopted. Thedeveloping situational cognition and reward feedback generation algorithms for force intelligent game are proposed, a behavior modeling hierarchical framework based on hybrid intelligence modeling method is constructed, which solve the technical difficulty of sparse reward in the reinforcement learning process. It provides an feasible reinforcement learning training method that can solve the large-scale, multi-model, and multi-element air combat problems.

Key words: combat simulation, Multi-agent system, DRL, non-sparse reward function

中图分类号:

TP391.9

王宇琨,王泽,董力维等 . 基于分层的智能建模方法的多机空战行为建模[J]. 系统仿真学报, 2023, 35(10): 2249-2261.

Wang Yukun,Wang Ze,Dong Liwei,et al . Research on Multi-aircraft Air Combat Behavior Modeling Based on Hierarchical Intelligent Modeling Methods[J]. Journal of System Simulation, 2023, 35(10): 2249-2261.

图/表 15

图1

图2

图3

图4

图5

图6

图7

表1

表2

图8

图9

表3

表4

图10

图11

参考文献 20

1	Holcomb S D, Porter W K, Ault S V, et al. Overview on DeepMind and Its AlphaGo Zero AI[C]//Proceedings of the 2018 International Conference on Big Data and Education. New York, NY, USA: Association for Computing Machinery, 2018: 67-71.
2	Arulkumaran K, Cully A, Togelius J. AlphaStar: an Evolutionary Computation Perspective[C]//Proceedings of the Genetic and Evolutionary Computation Conference Companion. New York, NY, USA: Association for Computing Machinery, 2019: 314-315.
3	Berner C, Brockman G, Chan B, et al. Dota 2 With Large Scale Deep Reinforcement Learning[EB/OL]. (2019-12-13) [2023-05-10]. .
4	杨惟轶, 白辰甲, 蔡超, 等. 深度强化学习中稀疏奖励问题研究综述[J]. 计算机科学, 2020, 47(3): 182-191.
	Yang Weiyi, Bai Chenjia, Cai Chao, et al. Survey on Sparse Reward in Deep Reinforcement Learning[J]. Computer Science, 2020, 47(3): 182-191.
5	Chen G. A New Framework for Multi-agent Reinforcement Learning-centralized Training and Exploration With Decentralized Execution via Policy Distillation[EB/OL]. (2019-10-21) [2022-11-02]. .
6	Lowe R, Wu Yi, Tamar A, et al. Multi-agent Actor-critic for Mixed Cooperative-competitive Environments[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY, USA: Curran Associates Inc., 2017: 6382-6393.
7	Marthi B. Automatic Shaping and Decomposition of Reward Functions[C]//Proceedings of the 24th International Conference on Machine learning. New York, NY, USA: Association for Computing Machinery, 2007: 601-608.
8	Chen Jiayu, Zhang Yuanxin, Xu Yuanfan, et al. Variational Automatic Curriculum Learning for Sparse-reward Cooperative Multi-agent Problems[C]//35th Conference on Neural Information Processing Systems. Red Hook, NY, USA: Curran Associates Inc., 2021, 34: 9681-9693.
9	Hu Yujing, Wang Weixun, Jia Hangtian, et al. Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping[C]//34th Conference on Neural Information Processing Systems. Red Hook, NY, USA: Curran Associates Inc., 2020: 15931-15941.
10	Zhelo O, Zhang Jingwei, Tai Lei, et al. Curiosity-driven Exploration for Mapless Navigation With Deep Reinforcement Learning[EB/OL]. (2018-05-14) [2023-06-21]. .
11	Wang Xin, Chen Yudong, Zhu Wenwu. A Survey on Curriculum Learning[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(9): 4555-4576.
12	Hutsebaut-Buysse M, Mets K, Latré S. Hierarchical Reinforcement Learning: A Survey and Open Research Challenges[J]. Machine Learning & Knowledge Extraction, 2022, 4(1): 172-221.
13	周攀, 黄江涛, 章胜, 等. 基于深度强化学习的智能空战决策与仿真[J]. 航空学报, 2023, 44(4): 94-107.
	Zhou Pan, Huang Jiangtao, Zhang Sheng, et al. Intelligent Air Combat Decision Making and Simulation Based on Deep Reinforcement Learning[J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(4): 94-107.
14	李永丰, 史静平, 章卫国, 等. 深度强化学习的无人作战飞机空战机动决策[J]. 哈尔滨工业大学学报, 2021, 53(12): 33-41.
	Li Yongfeng, Shi Jingping, Zhang Weiguo, et al. Maneuver Decision of UCAV in Air Combat Based on Deep Reinforcement Learning[J]. Journal of Harbin Institute of Technology, 2021, 53(12): 33-41.
15	Wang Zhuang, Li Hui, Wu Haolin, et al. Improving Maneuver Strategy in Air Combat by Alternate Freeze Games With a Deep Reinforcement Learning Algorithm[J]. Mathematical Problems in Engineering, 2020, 2020: 7180639.
16	章胜, 杜昕, 肖娟, 等. 基于深度强化学习的固定翼飞行器六自由度飞行智能控制[J]. 指挥与控制学报, 2022, 8(2): 179-188.
	Zhang Sheng, Du Xin, Xiao Juan, et al. Fixed-wing Aircraft 6-DOF Flight Control Based on Deep Reinforcement Learning[J]. Journal of Command and Control, 2022, 8(2): 179-188.
17	孙智孝, 杨晟琦, 朴海音, 等. 未来智能空战发展综述[J]. 航空学报, 2021, 42(8): 28-42.
	Sun Zhixiao, Yang Shengqi, Haiyin Piao, et al. A Survey of Air Combat Artificial Intelligence[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42(8): 28-42.
18	Tashev B, Purcell M, McLaughlin B. Russia's Information Warfare: Exploring the Cognitive Dimension[J]. MCU Journal, 2019, 10(2): 129-147.
19	Hamfelt A, Karlsson M, Thierfelder T, et al. Beyond K-means: Clusters Identification for GIS[M]//Popovich V V, Claramunt C, Devogele T, et al. Information Fusion and Geographic Information Systems: Towards the Digital Ocean. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011: 93-105.
20	Rashid T, Farquhar G, Peng Bei, et al. Weighted QMIX: Expanding Monotonic Value Function Factorisation for Deep Multi-agent Reinforcement Learning[C]//34th Conference on Neural Information Processing Systems. Red Hook, NY, USA: Curran Associates Inc., 2020: 10199-10210.

规则前件	规则后件
剩余弹药&&未分配目标&&存在未分配敌机	就近原则分配未被锁定敌机作为目标
无剩余弹药	不进行目标分配
剩余弹药&&已分配目标&&存在更近敌机	将更近敌机设置为新目标
剩余弹药&&已分配目标&&目标已损毁	重新分配拦截目标
剩余弹药&&无分配目标&&无未分配敌机	到调度层指定的区域巡逻

规则前件	规则后件
当前状态为区域巡逻&&收到空中拦截指令	进行空中拦截
当前状态为空中拦截&&拦截目标已损毁	前往网络输出的指定区域巡逻
油料不足	返航

参数	参数含义
X_r	红方兵力平台x坐标位置
Y_r	红方兵力平台y坐标位置
LX_r	红方兵力平台类型
H_r	红方兵力平台航向
D_num_r	红方兵力平台的携弹量
A_r	红方火力域覆盖比
I_r	红方探测信息域覆盖范围
X_b	蓝方兵力平台x坐标位置
Y_b	蓝方兵力平台y坐标位置
LX_b	蓝方兵力平台类型
H_b	蓝方兵力平台的航向
A_b	蓝方火力域覆盖比
I_b	蓝方探测信息域覆盖范围

参数	参数含义	参数取值
n_agents	智能体数量	2
obs_dim	输入状态空间维度	264
action_dim	行为空间	64
batch_size	批次	512
gamma	折扣因子	0.9
replace_target_iter	目标网络参数更新周期	200
lr	学习率	0.000 5
epsilon	探索概率	1.0
epsilon_min	最小探索概率	0.02
epsilon_decay	探索衰减概率	0.999 9

[1]	林俊强, 王红军, 邹湘军, 张坡, 李承恩, 周益鹏, 姚书杰. 基于DPPO的移动采摘机器人避障路径规划及仿真[J]. 系统仿真学报, 2023, 35(8): 1692-1704.
[2]	刘家义, 王刚, 付强, 郭相科, 王思远. 基于分配策略优化算法的智能防空任务分配[J]. 系统仿真学报, 2023, 35(8): 1705-1716.
[3]	杨来义, 毕敬, 苑海涛. 基于SAC算法的移动机器人智能路径规划[J]. 系统仿真学报, 2023, 35(8): 1726-1736.
[4]	郭力强, 马亮, 张会, 杨静, 范学满, 程卓. 基于博弈对抗的鱼雷抗干扰攻击建模与优化方法研究[J]. 系统仿真学报, 2023, 35(8): 1814-1823.
[5]	马苗苗, 董利鹏, 刘向杰. 基于Q-learning算法的多智能体微电网能量管理策略[J]. 系统仿真学报, 2023, 35(7): 1487-1496.
[6]	李成兵, 李云飞, 吴鹏. 基于多智能体的城市群客运网络脆弱性动态仿真[J]. 系统仿真学报, 2023, 35(6): 1183-1190.
[7]	丁飞, 沙宇晨, 洪莹, 蒯晓, 张登银. 智能网联汽车计算卸载与边缘缓存联合优化策略[J]. 系统仿真学报, 2023, 35(6): 1203-1214.
[8]	戴宇轩, 崔承刚. 基于深度强化学习的Boost变换器控制策略[J]. 系统仿真学报, 2023, 35(5): 1109-1119.
[9]	刘延东, 黄高翔, 陈文. 基于增强心理行为异质性的改进社会力模型[J]. 系统仿真学报, 2023, 35(5): 1120-1130.
[10]	徐浩添, 秦龙, 曾俊杰, 胡越, 张琪. 基于深度强化学习的对手建模方法研究综述[J]. 系统仿真学报, 2023, 35(4): 671-694.
[11]	石鼎, 燕雪峰, 宫丽娜, 张静宣, 关东海, 魏明强. 强化学习驱动的海战场多智能体协同作战仿真算法[J]. 系统仿真学报, 2023, 35(4): 786-796.
[12]	袁国栋, 何明, 马子玉, 张伟士, 刘学达, 李伟. 基于K-means聚类的多智能体跟随多领导者算法[J]. 系统仿真学报, 2023, 35(3): 616-622.
[13]	徐颖, 张帅, 谢智歌, 徐新海, 孙曼晖, 郭宁. 基于三维剖分的机载雷达实时探测仿真方法[J]. 系统仿真学报, 2023, 35(2): 268-276.
[14]	丁柏圆, 穆富岭, 李云鹏, 陈忠宽, 刘承禹. 面向复杂电磁环境的体系作战仿真平台设计[J]. 系统仿真学报, 2023, 35(2): 330-338.
[15]	史佳洁, 杨鹏, 皮雁南. 基于机器学习的地铁行人流在线优化控制研究[J]. 系统仿真学报, 2023, 35(2): 386-395.