Dynamic Task Planning for Wargaming Based on Large Language Models

doi:10.16182/j.issn1004731x.joss.25-1096

Abstract

Abstract:

To address the problems of great difficulty in intelligent decision-making and insufficient dynamism in task planning caused by the complex adversarial environment and strong uncertainty in wargaming tasks, this paper proposed a hierarchical Agent collaborative decision-making framework based on large and small model synergy. Through a multi-level structure, the hierarchical decoupling and dynamic coordination of battlefield tasks were achieved. A memory management module was constructed, and a query optimization mechanism driven by large language models was introduced to dynamically perceive the decision-making process and query intent, completing the semantic reconstruction and context completion of raw queries. A time-driven two-stage task planning process was designed to achieve global task planning formulation, original task evaluation, and dynamic task update, respectively. Experimental results indicate that the DeepSeek-V3-centered decision-making model exhibits good task planning capability and instruction-following capability under this framework.

Key words: large language model, retrieval-augmented generation, intelligent decision-making, task planning, strategy generation, wargaming

CLC Number:

TP391.9

Liu Yingang, Ma Ming, Zhang Ronghua. Dynamic Task Planning for Wargaming Based on Large Language Models[J]. Journal of System Simulation, 2026, 38(5): 1187-1204.

Figures/Tables 13

Fig. 1

Fig. 2

Table 1

Fig. 3

Fig. 4

Fig. 5

Table 2

Fig. 6

Table 3

Fig. 7

Fig. 8

Fig. 9

Fig. 10

References 25

[1]	罗俊仁, 张万鹏, 项凤涛, 等. 智能推演综述:博弈论视角下的战术战役兵棋与战略博弈[J]. 系统仿真学报, 2023, 35(9): 1871-1894.
	Luo Junren, Zhang Wanpeng, Xiang Fengtao, et al. Survey on Intelligent Wargaming: Tactical & Campaign Wargame and Strategic Game from Game-theoretic Perspective[J]. Journal of System Simulation, 2023, 35(9): 1871-1894.
[2]	周志杰, 曹友, 胡昌华, 等. 基于规则的建模方法的可解释性及其发展[J]. 自动化学报, 2021, 47(6): 1201-1216.
	Zhou Zhijie, Cao You, Hu Changhua, et al. The Interpretability of Rule-based Modeling Approach and Its Development[J]. Acta Automatica Sinica, 2021, 47(6): 1201-1216.
[3]	薛辉, 刘铁林, 苏小波. 基于Bayes混合先验分布的兵棋推演规则建模方法[J]. 火力与指挥控制, 2019, 44(6): 106-110.
	Xue Hui, Liu Tielin, Su Xiaobo. Research on Modeling Methods for Wargames Deduction Rules Based on Bayesian Hybrid Prior Distribution[J]. Fire Control & Command Control, 2019, 44(6): 106-110.
[4]	彭莉莎, 孙宇祥, 薛宇凡, 等. 融合三支多属性决策与SAC的兵棋推演智能决策技术[J]. 系统工程与电子技术, 2024, 46(7): 2310-2322.
	Peng Lisha, Sun Yuxiang, Xue Yufan, et al. Intelligent Decision-making Technology for Wargame by Integrating Three-way Multiple Attribute Decision-making and SAC[J]. Systems Engineering and Electronics, 2024, 46(7): 2310-2322.
[5]	Nicolau Miguel, Perez-Liebana D, O'Neill Michael, et al. Evolutionary Behavior Tree Approaches for Navigating Platform Games[J]. IEEE Transactions on Computational Intelligence and AI in Games, 2017, 9(3): 227-238.
[6]	Najam-ul-Islam M, Fatima Tu Zahra, Atif Raza Jafri, et al. Auto Implementation of Parallel Hardware Architecture for Aho-corasick Algorithm[J]. Design Automation for Embedded Systems, 2022, 26(1): 29-53.
[7]	崔文华, 李东, 唐宇波, 等. 基于深度强化学习的兵棋推演决策方法框架[J]. 国防科技, 2020, 41(2): 113-121.
	Cui Wenhua, Li Dong, Tang Yubo, et al. Framework of Wargaming Decision-making Methods Based on Deep Reinforcement Learning[J]. National Defense Technology, 2020, 41(2): 113-121.
[8]	Chen Li, Zhang Yulong, Feng Yanghe, et al. A Human-machine Agent Based on Active Reinforcement Learning for Target Classification in Wargame[J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(7): 9858-9870.
[9]	李琛, 黄炎焱, 张永亮, 等. Actor-Critic框架下的多智能体决策方法及其在兵棋上的应用[J]. 系统工程与电子技术, 2021, 43(3): 755-762.
	Li Chen, Huang Yanyan, Zhang Yongliang, et al. Multi-agent Decision-making Method Based on Actor-Critic Framework and Its Application in Wargame[J]. Systems Engineering and Electronics, 2021, 43(3): 755-762.
[10]	徐佳乐, 张海东, 赵东海, 等. 基于卷积神经网络的陆战兵棋战术机动策略学习[J]. 系统仿真学报, 2022, 34(10): 2181-2193.
	Xu Jiale, Zhang Haidong, Zhao Donghai, et al. Tactical Maneuver Strategy Learning from Land Wargame Replay Based on Convolutional Neural Network[J]. Journal of System Simulation, 2022, 34(10): 2181-2193.
[11]	Huang Wenlong, Mordatch I, Pathak D. One Policy to Control Them All: Shared Modular Policies for Agent-agnostic Control[C]//Proceedings of the 37th International Conference on Machine Learning. Chia Laguna Resort: PMLR, 2020: 4455-4464.
[12]	Dong Liwei, Li Ni, Yuan Haitao, et al. Accelerating Wargaming Reinforcement Learning by Dynamic Multi-demonstrator Ensemble[J]. Information Sciences, 2023, 648: 119534.
[13]	张振, 黄炎焱, 张永亮, 等. 基于近端策略优化的作战实体博弈对抗算法[J]. 南京理工大学学报, 2021, 45(1): 77-83.
	Zhang Zhen, Huang Yanyan, Zhang Yongliang, et al. Battle Entity Confrontation Algorithm Based on Proximal Policy Optimization[J]. Journal of Nanjing University of Science and Technology, 2021, 45(1): 77-83.
[14]	尹奇跃, 赵美静, 倪晚成, 等. 兵棋推演的智能决策技术与挑战[J]. 自动化学报, 2023, 49(5): 913-928.
	Yin Qiyue, Zhao Meijing, Ni Wancheng, et al. Intelligent Decision Making Technology and Challenge of Wargame[J]. Acta Automatica Sinica, 2023, 49(5): 913-928.
[15]	蒲志强, 易建强, 刘振, 等. 知识和数据协同驱动的群体智能决策方法研究综述[J]. 自动化学报, 2022, 48(3): 627-643.
	Pu Zhiqiang, Yi Jianqiang, Liu Zhen, et al. Knowledge-based and Data-driven Integrating Methodologies for Collective Intelligence Decision Making: A survey[J]. Acta Automatica Sinica, 2022, 48(3): 627-643.
[16]	黄凯奇, 兴军亮, 张俊格, 等. 人机对抗智能技术[J]. 中国科学(信息科学), 2020, 50(4): 540-550.
	Huang Kaiqi, Xing Junliang, Zhang Junge, et al. Intelligent Technologies of Human-computer Gaming[J]. Scientia Sinica(Informationis), 2020, 50(4): 540-550.
[17]	刘满, 张宏军, 程恺, 等. 知识与数据互补的战术级兵棋行为决策框架设计与实现[J]. 指挥与控制学报, 2023, 9(2): 182-191.
	Liu Man, Zhang Hongjun, Cheng Kai, et al. Framework Design and Application for Tactical-level Wargame Behavior Decision-making Based on Complementary Knowledge and Data[J]. Journal of Command and Control, 2023, 9(2): 182-191.
[18]	Muller P, Omidshafiei S, Rowland M, et al. A Generalized Training Approach for Multiagent Learning[C]//ICLR 2020. New York: ICLR, 2020: 12631-12666.
[19]	Hinton P. Generative AI and Wargaming: What is it Good For?[J]. The RUSI Journal, 2023, 168(7): 34-41.
[20]	孙宇祥, 赵俊杰, 解宇轩, 等. 自生成兵棋AI:基于大语言模型的双层Agent任务规划[J]. 控制与决策, 2024, 39(12): 3927-3936.
	Sun Yuxiang, Zhao Junjie, Xie Yuxuan, et al. Self Generated Wargame AI: Double Layer Agent Task Planning Based on Large Language Model[J]. Control and Decision, 2024, 39(12): 3927-3936.
[21]	崔翛龙, 高志强, 姬纬通, 等. "艾武大模型+": 一种军事大模型系统的开发与实证[J]. 数据采集与处理, 2024, 39(3): 588-597.
	Cui Xiaolong, Gao Zhiqiang, Ji Weitong, et al. "Aiwu Large Model+": Development and Empirical Study of Military Large Model System[J]. Journal of Data Acquisition and Processing, 2024, 39(3): 588-597.
[22]	谷学强, 罗俊仁, 周棪忠, 等. 智能博弈决策大模型智能体技术综述[J]. 系统仿真学报, 2025, 37(5): 1142-1157.
	Gu Xueqiang, Luo Junren, Zhou Yanzhong, et al. Survey on Large Language Agent Technologies for Intelligent Game Theoretic Decision-making[J]. Journal of System Simulation, 2025, 37(5): 1142-1157.
[23]	Goecks V G, Waytowich N. COA-GPT: Generative Pre-trained Transformers for Accelerated Course of Action Development in Military Operations[C]//2024 International Conference on Military Communication and Information Systems (ICMCIS). Piscataway: IEEE, 2024: 01-10.
[24]	马亚明, 华一新, 张亚军. 战场态势信息数据模型研究[J]. 系统仿真学报, 2009, 21(4): 948-953.
	Ma Yaming, Hua Yixin, Zhang Yajun. Study on Data Model of Battlefield Situation Information[J]. Journal of System Simulation, 2009, 21(4): 948-953.
[25]	Bernal Jiménez Gutiérrez, Shu Yiheng, Gu Yu, et al. HippoRAG: Neurobiologically Inspired Long-term Memory for Large Language Models[C]//Advances in Neural Information Processing Systems, 2024: 59532-59569.

知识文档类型	内容归属	可见性标签
战场态势数据	全局共享	全部
初始任务规划	陆军部分	决策中枢Agent
	陆军部分	陆军指挥Agent
	空军部分	决策中枢Agent
	空军部分	空军指挥Agent
改进建议	陆军部分	决策中枢Agent
	陆军部分	陆军指挥Agent
	空军部分	决策中枢Agent
	空军部分	空军指挥Agent
行动方案	陆军部分	决策中枢Agent
	陆军部分	陆军指挥Agent
	空军部分	决策中枢Agent
	空军部分	空军指挥Agent
对改进建议的审核	陆军部分	决策中枢Agent
	陆军部分	陆军指挥Agent
	空军部分	决策中枢Agent
	空军部分	空军指挥Agent
对行动方案的审核	陆军部分	决策中枢Agent
	陆军部分	陆军指挥Agent
	空军部分	决策中枢Agent
	空军部分	空军指挥Agent
预选行动方案	陆军部分	决策中枢Agent
	陆军部分	陆军指挥Agent
	空军部分	决策中枢Agent
	空军部分	空军指挥Agent
最终行动方案	全局共享	全部

阵营	作战单元	单元数量
蓝方	空军基地	1
	战斗机	34
	预警机	2
	反潜机	3
	对空搜索雷达	1
	地空导弹排	3
	步兵排	3
	水声监视系统	2
红方	空军基地	1
	战斗机	34
	预警机	2
	反潜机	3
	对空搜索雷达	1
	地空导弹排	3
	步兵排	3
	护卫舰	1
	潜艇	1

作战单元	分值	阶段性目标	分值
空军基地	±100	地面部队突破封锁	±80
战斗机	±10	战斗机突破封锁并进入敌方机场领空	±100
预警机	±80
反潜机	±30
对空搜索雷达	±50
地空导弹排	±50
步兵排	±10
水声监视系统	±10
护卫舰	±80