Research on Learnable Wargame Agent Driven by Battle Scheme

doi:10.16182/j.issn1004731x.joss.23-0477

Abstract

Abstract:

To enable the agent to cope with complex battle scenarios and objectives in wargame, a learnable wargame agent architecture driven by a battle scheme is proposed. By analyzing the "attachment characteristics" and "loose coupling characteristics" of the agent to wargame system, the learnable requirements of the agent are obtained. In the design of the agent framework, battle schemes are used to reduce the learning range of the agent. The finite state machine corresponds to the knowledge of the operational phase in the battle scheme, and the decision-making space of the agent is determined according to the framework of the battle scheme. A learnable deep neural network is designed to explore key decision space. The neural network uses prior knowledge imitation learning mode and deep reinforcement learning mode. This architecture can iteratively explore optimal deployment and collaboration issues for multiple chessmen that are difficult for humans to fully tease out.

Key words: wargame, agent, battle scheme, deep neural network, reinforcement learning, imitation learning

CLC Number:

TP391.9

Sun Yifeng, Li Zhi, Wu Jiang, Wang Yubin. Research on Learnable Wargame Agent Driven by Battle Scheme[J]. Journal of System Simulation, 2024, 36(7): 1525-1535.

Figures/Tables 7

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Fig. 6

Fig. 7

References 25

1	Schrittwieser J, Antonoglou I, Hubert T, et al. Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model[J]. Nature, 2020, 588(7839): 604-609.
2	Vinyals O, Babuschkin I, Czarnecki W M, et al. Grandmaster Level in StarCraft II Using Multi-agent Reinforcement Learning[J]. Nature, 2019, 575(7782): 350-354.
3	Berner C, Brockman G, Chan B, et al. Dota 2 with Large Scale Deep Reinforcement Learning[EB/OL]. (2019-12-13) [2023-01-30]. .
4	Starken A, Mondesire S, Wu A. Trends in Machine Learning for Adaptive Automated Forces[C]//Iterservice/Industry Training, Simulation, & Education Conference(I/ITSEC), 2022, 22243: 1-13.
5	施伟, 冯旸赫, 程光权, 等. 基于深度强化学习的多机协同空战方法研究[J]. 自动化学报, 2021, 47(7): 1610-1623.
	Shi Wei, Feng Yanghe, Cheng Guangquan, et al. Research on Multi-aircraft Cooperative Air Combat Method Based on Deep Reinforcement Learning[J]. Acta Automatica Sinica, 2021, 47(7): 1610-1623.
6	徐佳乐, 张海东, 赵东海, 等. 基于卷积神经网络的陆战兵棋战术机动策略学习[J]. 系统仿真学报, 2022, 34(10): 2181-2193.
	Xu Jiale, Zhang Haidong, Zhao Donghai, et al. Tactical Maneuver Strategy Learning from Land Wargame Replay Based on Convolutional Neural Network[J]. Journal of System Simulation, 2022, 34(10): 2181-2193.
7	胡晓峰, 齐大伟. 智能决策问题探讨——从游戏博弈到作战指挥,距离还有多远[J]. 指挥与控制学报, 2020, 6(4): 356-363.
	Hu Xiaofeng, Qi Dawei. On Problems of Intelligent Decision-making-how Far is It from Game-playing to Operational Command[J]. Journal of Command and Control, 2020, 6(4): 356-363.
8	俞康伦. 兵棋设计[M]. 北京: 国防工业出版社, 2018.
9	阳曙光. 兵棋总体设计[M]. 北京: 机械工业出版社, 2018.
10	DeepMind. Pysc2[EB/OL]. (2017-08-10) [2023-03-20]. .
11	中国科学院. 庙算•陆战指挥官[EB/OL]. (2020-09-01) [2023-03-20]. .
12	孙宇祥, 彭益辉, 李斌, 等. 智能博弈综述:游戏AI对作战推演的启示[J]. 智能科学与技术学报, 2022, 4(2): 157-173.
	Sun Yuxiang, Peng Yihui, Li Bin, et al. Overview of Intelligent Game: Enlightenment of Game AI to Combat Deduction[J]. Chinese Journal of Intelligent Science and Technology, 2022, 4(2): 157-173.
13	秦晓周. 联合作战辅助决策方法研究[M]. 北京: 国防大学出版社, 2019.
14	马平, 杨功坤. 联合作战研究[M]. 北京: 国防大学出版社, 2013.
15	Millington L. AI for Games[M]. 3rd ed. 北京: 清华大学出版社, 2021.
16	石俊杰. 基于有限状态机的游戏角色控制系统设计与实现[D]. 武汉: 华中科技大学, 2016.
	Shi Junjie. Design and Implementation of a FSM-based Role Control System[D]. Wuhan: Huazhong University of Science and Technology, 2016.
17	Michael M, Eilon S, Shmuel Z. Game Theory[M]. Cambridge: Cambridge University Press, 2013: 155-166.
18	周雷, 尹奇跃, 黄凯奇. 人机对抗中的博弈学习方法[J]. 计算机学报, 2022, 45(9): 1859-1876.
	Zhou Lei, Yin Qiyue, Huang Kaiqi. Game-theoretic Learning in Human-computer Gaming[J]. Chinese Journal of Computers, 2022, 45(9): 1859-1876.
19	Li Zewen, Liu Fan, Yang Wenjie, et al. A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(12): 6999-7019.
20	Zunair H, Ben Hamza A. Sharp U-net: Depthwise Convolutional Network for Biomedical Image Segmentation[J]. Computers in Biology and Medicine, 2021, 136: 104699.
21	Zeiler M D, Taylor G W, Fergus R. Adaptive Deconvolutional Networks for Mid and High Level Feature Learning[C]//2011 International Conference on Computer Vision. Piscataway, NJ, USA: IEEE, 2011: 2018-2025.
22	He Kaiming, Zhang Xiangyu, Ren Shaoqing, et al. Deep Residual Learning for Image Recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, NJ, USA: IEEE, 2016: 770-778.
23	王玉宾. 面向兵棋推演的智能对抗策略生成技术研究[D]. 郑州: 战略支援部队信息工程大学, 2022.
	Wang Yubin. Research on Intelligent Strategy Generation Technology for Wargame[D]. Zhengzhou: PLA Strategic Support Force Information Engineering University, 2022.
24	王玉宾, 孙怡峰, 吴疆, 等. 陆战对抗中的智能体博弈策略生成方法[J]. 指挥与控制学报, 2022, 8(4): 441-450.
	Wang Yubin, Sun Yifeng, Wu Jiang, et al. An Agent Game Strategy Generation Method for Land Warfare[J]. Journal of Command and Control, 2022, 8(4): 441-450.
25	Perolat J, Bart De Vylder, Hennes D, et al. Mastering the Game of Stratego with Model-free Multiagent Reinforcement Learning[J]. Science, 2022, 378(6623): 990-996.

[1]	Dong Zhiming, Hu Zhongqi, Dai Haoran, Gao Jiancheng. An Automated Generation Method for Combat Simulation Scenarios Based on Large Language Models [J]. Journal of System Simulation, 2026, 38(5): 1129-1145.
[2]	Zhou Zicong, Zeng Junjie, Hu Yue, Zhu Zhengqiu, Yin Quanjun. Multi-agent Reinforcement Learning Method for Wargame Simulation Based on Suboptimal Demonstration Guidance [J]. Journal of System Simulation, 2026, 38(5): 1277-1289.
[3]	Li Guozheng, Wang Rui, Fan Shichao, Cai Xintong, Zhai Xinyue. Review on Optimization of Simulation Modeling Strategies for Spacecraft Orbit Avoidance [J]. Journal of System Simulation, 2026, 38(4): 855-868.
[4]	Li Dequan, Xiong Wan. Robot Path Planning by Reinforcement Learning Based on SAC3Q-HDM [J]. Journal of System Simulation, 2026, 38(3): 714-724.
[5]	Liu Dayong, Dong Zhiming, Guo Qisheng, Gao Ang, Qiu Xuehuan. Construction Approach of LLM-empowered Tactical Wargame Decision-making Agents [J]. Journal of System Simulation, 2026, 38(3): 758-775.
[6]	Li Jiting, Sun Yi, Wang Yirong, Lin Yiqin, Jia Jun, Ding Gangsong. LLM-driven Multi-agent Social Network Simulation: Interdisciplinary Integration and Cutting-edge Development [J]. Journal of System Simulation, 2026, 38(2): 235-260.
[7]	Zhang Mingxin, Wu Jinxuan, Zhu Rui, Wang Yunlong, Meng Wenjuan, Liu Zhe, Li Xu, Chen Xiaolei, Liang Yuxuan, Zheng Yi, Xue Xiangyang. Social Cognition Simulation with Large Language Model-driven Agents [J]. Journal of System Simulation, 2026, 38(2): 261-277.
[8]	Yang Can, Chen Kai, Zhu Feng. Reinforcement Learning Based Method for UAV Team Orienteering Optimization under Multi-constraint Condition [J]. Journal of System Simulation, 2026, 38(2): 360-371.
[9]	Zheng Wei, Tang Jiahao, Xiong Xiaoping, Fan Xin. Intelligent Decision-making Method in Imbalanced Air Combat Based on Asymmetric Self-play [J]. Journal of System Simulation, 2026, 38(2): 433-446.
[10]	Ding Zhengkun, Liu Jiaqi, Xu Junzheng, Xu Yuezhu, Wang Xingmei. Intelligent Air Combat Decision-making Method Based on BiGRU and Priority Dynamic Sampling [J]. Journal of System Simulation, 2026, 38(2): 447-459.
[11]	Tao Caixia, Chen Naikun, Gao Fengyang, Zhang Jiangang. Distributed Optimization for Integrated Energy Based on Multi-agent Reinforcement Learning [J]. Journal of System Simulation, 2026, 38(2): 476-487.
[12]	Wang Yifan, Yang Bin, Wang Congjun. Simulation Method for Multi-crew Construction Processes Based on Large Language Model-powered Agent [J]. Journal of System Simulation, 2026, 38(2): 488-500.
[13]	Jiang Ming, He Tao. Solving the Vehicle Routing Problem Based on Deep Reinforcement Learning [J]. Journal of System Simulation, 2025, 37(9): 2177-2187.
[14]	Ni Peilong, Mao Pengjun, Wang Ning, Yang Mengjie. Robot Path Planning Based on Improved A-DDQN Algorithm [J]. Journal of System Simulation, 2025, 37(9): 2420-2430.
[15]	Chen Zhen, Wu Zhuoyi, Zhang Lin. Research on Policy Representation in Deep Reinforcement Learning [J]. Journal of System Simulation, 2025, 37(7): 1753-1769.