Research on Learnable Wargame Agent Driven by Battle Scheme

doi:10.16182/j.issn1004731x.joss.23-0477

Abstract

Abstract:

To enable the agent to cope with complex battle scenarios and objectives in wargame, a learnable wargame agent architecture driven by a battle scheme is proposed. By analyzing the "attachment characteristics" and "loose coupling characteristics" of the agent to wargame system, the learnable requirements of the agent are obtained. In the design of the agent framework, battle schemes are used to reduce the learning range of the agent. The finite state machine corresponds to the knowledge of the operational phase in the battle scheme, and the decision-making space of the agent is determined according to the framework of the battle scheme. A learnable deep neural network is designed to explore key decision space. The neural network uses prior knowledge imitation learning mode and deep reinforcement learning mode. This architecture can iteratively explore optimal deployment and collaboration issues for multiple chessmen that are difficult for humans to fully tease out.

Key words: wargame, agent, battle scheme, deep neural network, reinforcement learning, imitation learning

CLC Number:

TP391.9

Sun Yifeng, Li Zhi, Wu Jiang, Wang Yubin. Research on Learnable Wargame Agent Driven by Battle Scheme[J]. Journal of System Simulation, 2024, 36(7): 1525-1535.

Figures/Tables 7

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Fig. 6

Fig. 7

References 25

1	Schrittwieser J, Antonoglou I, Hubert T, et al. Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model[J]. Nature, 2020, 588(7839): 604-609.
2	Vinyals O, Babuschkin I, Czarnecki W M, et al. Grandmaster Level in StarCraft II Using Multi-agent Reinforcement Learning[J]. Nature, 2019, 575(7782): 350-354.
3	Berner C, Brockman G, Chan B, et al. Dota 2 with Large Scale Deep Reinforcement Learning[EB/OL]. (2019-12-13) [2023-01-30]. .
4	Starken A, Mondesire S, Wu A. Trends in Machine Learning for Adaptive Automated Forces[C]//Iterservice/Industry Training, Simulation, & Education Conference(I/ITSEC), 2022, 22243: 1-13.
5	施伟, 冯旸赫, 程光权, 等. 基于深度强化学习的多机协同空战方法研究[J]. 自动化学报, 2021, 47(7): 1610-1623.
	Shi Wei, Feng Yanghe, Cheng Guangquan, et al. Research on Multi-aircraft Cooperative Air Combat Method Based on Deep Reinforcement Learning[J]. Acta Automatica Sinica, 2021, 47(7): 1610-1623.
6	徐佳乐, 张海东, 赵东海, 等. 基于卷积神经网络的陆战兵棋战术机动策略学习[J]. 系统仿真学报, 2022, 34(10): 2181-2193.
	Xu Jiale, Zhang Haidong, Zhao Donghai, et al. Tactical Maneuver Strategy Learning from Land Wargame Replay Based on Convolutional Neural Network[J]. Journal of System Simulation, 2022, 34(10): 2181-2193.
7	胡晓峰, 齐大伟. 智能决策问题探讨——从游戏博弈到作战指挥,距离还有多远[J]. 指挥与控制学报, 2020, 6(4): 356-363.
	Hu Xiaofeng, Qi Dawei. On Problems of Intelligent Decision-making-how Far is It from Game-playing to Operational Command[J]. Journal of Command and Control, 2020, 6(4): 356-363.
8	俞康伦. 兵棋设计[M]. 北京: 国防工业出版社, 2018.
9	阳曙光. 兵棋总体设计[M]. 北京: 机械工业出版社, 2018.
10	DeepMind. Pysc2[EB/OL]. (2017-08-10) [2023-03-20]. .
11	中国科学院. 庙算•陆战指挥官[EB/OL]. (2020-09-01) [2023-03-20]. .
12	孙宇祥, 彭益辉, 李斌, 等. 智能博弈综述:游戏AI对作战推演的启示[J]. 智能科学与技术学报, 2022, 4(2): 157-173.
	Sun Yuxiang, Peng Yihui, Li Bin, et al. Overview of Intelligent Game: Enlightenment of Game AI to Combat Deduction[J]. Chinese Journal of Intelligent Science and Technology, 2022, 4(2): 157-173.
13	秦晓周. 联合作战辅助决策方法研究[M]. 北京: 国防大学出版社, 2019.
14	马平, 杨功坤. 联合作战研究[M]. 北京: 国防大学出版社, 2013.
15	Millington L. AI for Games[M]. 3rd ed. 北京: 清华大学出版社, 2021.
16	石俊杰. 基于有限状态机的游戏角色控制系统设计与实现[D]. 武汉: 华中科技大学, 2016.
	Shi Junjie. Design and Implementation of a FSM-based Role Control System[D]. Wuhan: Huazhong University of Science and Technology, 2016.
17	Michael M, Eilon S, Shmuel Z. Game Theory[M]. Cambridge: Cambridge University Press, 2013: 155-166.
18	周雷, 尹奇跃, 黄凯奇. 人机对抗中的博弈学习方法[J]. 计算机学报, 2022, 45(9): 1859-1876.
	Zhou Lei, Yin Qiyue, Huang Kaiqi. Game-theoretic Learning in Human-computer Gaming[J]. Chinese Journal of Computers, 2022, 45(9): 1859-1876.
19	Li Zewen, Liu Fan, Yang Wenjie, et al. A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(12): 6999-7019.
20	Zunair H, Ben Hamza A. Sharp U-net: Depthwise Convolutional Network for Biomedical Image Segmentation[J]. Computers in Biology and Medicine, 2021, 136: 104699.
21	Zeiler M D, Taylor G W, Fergus R. Adaptive Deconvolutional Networks for Mid and High Level Feature Learning[C]//2011 International Conference on Computer Vision. Piscataway, NJ, USA: IEEE, 2011: 2018-2025.
22	He Kaiming, Zhang Xiangyu, Ren Shaoqing, et al. Deep Residual Learning for Image Recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, NJ, USA: IEEE, 2016: 770-778.
23	王玉宾. 面向兵棋推演的智能对抗策略生成技术研究[D]. 郑州: 战略支援部队信息工程大学, 2022.
	Wang Yubin. Research on Intelligent Strategy Generation Technology for Wargame[D]. Zhengzhou: PLA Strategic Support Force Information Engineering University, 2022.
24	王玉宾, 孙怡峰, 吴疆, 等. 陆战对抗中的智能体博弈策略生成方法[J]. 指挥与控制学报, 2022, 8(4): 441-450.
	Wang Yubin, Sun Yifeng, Wu Jiang, et al. An Agent Game Strategy Generation Method for Land Warfare[J]. Journal of Command and Control, 2022, 8(4): 441-450.
25	Perolat J, Bart De Vylder, Hennes D, et al. Mastering the Game of Stratego with Model-free Multiagent Reinforcement Learning[J]. Science, 2022, 378(6623): 990-996.

[1]	Zhou Zhiyong, Mo Fei, Zhao Kai, Hao Yunbo, Qian Yufeng. Adaptive PID Control Algorithm Based on PPO [J]. Journal of System Simulation, 2024, 36(6): 1425-1432.
[2]	Wang Yuan, Xu Lin, Gong Xiaoze, Zhang Yongliang, Wang Yongli. Gradient-based Deep Reinforcement Learning Interpretation Methods [J]. Journal of System Simulation, 2024, 36(5): 1130-1140.
[3]	Tang Jinjun, Hu Lipeng, Li Mingyang, Zhang Xuan. Optimization of Highway Emergency Lane Control Based on Kriging Genetic Algorithm [J]. Journal of System Simulation, 2024, 36(5): 1165-1178.
[4]	Yan Xingyu, Li Dayan, Wang Niya, Zhang Kaixiang, Mao Jianlin. Multi-agent Path Planning with Obstacle Penalty Factor [J]. Journal of System Simulation, 2024, 36(3): 673-685.
[5]	Yan Shiliang, Wang Yinling, Lu Dandan, Pan Xiaoqin. Simulation and Optimization of Permanent Magnet Linear Machine Based on Deep Neural Network [J]. Journal of System Simulation, 2024, 36(3): 713-725.
[6]	Qin Baoxin, Zhang Yuxiao, Wu Sirui, Cao Weichong, Li Zhan. Intelligent Optimization of Coal Terminal Unloading Scheduling Based on Improved D3QN Algorithm [J]. Journal of System Simulation, 2024, 36(3): 770-781.
[7]	Zhang Guohui, Gao Ang, Zhang Ya'nan. Combat Effectiveness Evaluation Method of Homogeneous Cluster Equipment System Based on RLoMAG+EAS [J]. Journal of System Simulation, 2024, 36(1): 160-169.
[8]	Hu Mingwei, Yang Wenjie. Research on Campus Epidemic Evolution Based on Multi-scale Modeling and Simulation in Microscopic & Microscopic View [J]. Journal of System Simulation, 2024, 36(1): 170-182.
[9]	An Jing, Si Guangya, Zhang Lei. Strategy Optimization Method of Multi-dimension Projection Based on Deep Reinforcement Learning [J]. Journal of System Simulation, 2024, 36(1): 39-49.
[10]	Ma Shanzhi, Wang Hongliang, He Hua, Lun Weicheng. Research on Support Effectiveness Evaluation Method of Equipment Systems Based on PERT and ABMS [J]. Journal of System Simulation, 2023, 35(9): 1837-1846.
[11]	Guo Runxia, Wang Yifu. Aircraft Assignment Method for Optimal Utilization of Maintenance Intervals [J]. Journal of System Simulation, 2023, 35(9): 1985-1999.
[12]	Junqiang Lin, Hongjun Wang, Xiangjun Zou, Po Zhang, Chengen Li, Yipeng Zhou, Shujie Yao. Obstacle Avoidance Path Planning and Simulation of Mobile Picking Robot Based on DPPO [J]. Journal of System Simulation, 2023, 35(8): 1692-1704.
[13]	Jiayi Liu, Gang Wang, Qiang Fu, Xiangke Guo, Siyuan Wang. Intelligent Air Defense Task Assignment Based on Assignment Strategy Optimization Algorithm [J]. Journal of System Simulation, 2023, 35(8): 1705-1716.
[14]	Laiyi Yang, Jing Bi, Haitao Yuan. Intelligent Path Planning for Mobile Robots Based on SAC Algorithm [J]. Journal of System Simulation, 2023, 35(8): 1726-1736.
[15]	Miaomiao Ma, Lipeng Dong, Xiangjie Liu. Energy Management Strategy of Multi-agent Microgrid Based on Q-learning Algorithm [J]. Journal of System Simulation, 2023, 35(7): 1487-1496.