Multi-agent Cooperative Combat Simulation in Naval Battlefield with Reinforcement Learning

doi:10.16182/j.issn1004731x.joss.21-1321

Abstract

Abstract:

Due to the rapidly-changed situations of future naval battlefields, it is urgent to realize the high-quality combat simulation in naval battlefields based on artificial intelligence to comprehensively optimize and improve the combat effectiveness of our army and defeat the enemy. The collaboration of combat units is the key point and how to realize the balanced decision-making among multiple agents is the first task. Based on decoupling priority experience replay mechanism and attention mechanism, a multi-agent reinforcement learning-based cooperative combat simulation (MARL-CCSA) network is proposed. Based on the expert experience, a multi-scale reward function is designed, on which a naval battlefield combat simulation environment is constructed. The proposed multi-scale reward function could speedthe convergence of multiple agents. The feasibility and practicability of MARL-CCSA is verified by the simulation experiment and the comparison with the other methods.

Key words: combat simulation, collaboration, reinforcement learning, prioritized experience replay, attention mechanism, multi-scale reward function

CLC Number:

TP391.9

Ding Shi, Xuefeng Yan, Lina Gong, Jingxuan Zhang, Donghai Guan, Mingqiang Wei. Multi-agent Cooperative Combat Simulation in Naval Battlefield with Reinforcement Learning[J]. Journal of System Simulation, 2023, 35(4): 786-796.

Figures/Tables 10

Fig. 1

Fig. 2

Fig. 3

Table 1

Experimental parameters setting

主要参数	量值
经验池容量 $M$	10⁵
批样本数 $B a t c h s i z e$	1 024
折扣因子 $γ$	0.95
Critic网络学习率 $α c$	0.01
Actor网络学习率 $α a$	0.01
软更新率 $τ$	0.01
最大回合数 $M a x E p i s o d e$	5 000
每回合步数 $S t e p P e r E p i s o d e$	25
安全距离阈值 $L / k m$	2
演示时间步长/s	0.1

Table 1

Fig. 4

Fig. 5

Fig. 6

Fig. 7

Fig. 8

Fig. 9

References 21

1	Lowe R, Wu Y I, Tamar A, et al. Multi-agent Actor-critic for Mixed Cooperative-Competitive Environments[C]//Advances in Neural Information Processing Systems. San Francisco: Margan Kaufmann, 2017.
2	Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous Control with Deep Reinforcement Learning[C/OL]. International Conference on Learning Representations. 2016. [2022-06-11]. .
3	Rashid T, Samvelyan M, Schroeder C, et al. QMIX: Monotonic Value Function Factorisation for Deep Multi-agent Reinforcement Learning[C]//International Conference on Machine Learning. New York: PMLR, 2018: 4295-4304.
4	Watkins C J C H. Learning from Delayed Rewards[D]. London: King's College, 1989.
5	Rummery G A, Niranjan M. On-line Q-learning Using Connectionist Systems[M]. Cambridge, England: University of Cambridge, Department of Engineering, 1994.
6	Sutton R S, McAllester D A, Singh S P, et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation[C]//Advances in Neural Information Processing Systems. San Francisco: Margan Kaufmann, 2000: 1057-1063.
7	Mnih V, Kavukcuoglu K, Silver D, et al. Playing Atari with Deep Reinforcement Learning[J/OL]. [2022-06-11]. .
8	Barto A G, Sutton R S, Anderson C W. Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problems[J]. IEEE Transactions on Systems, Man, and Cybernetics(S0018-9472), 1983, 27(5): 834-846.
9	Hernandez-Leal P, Kartal B, Taylor M E. Is multiagent Deep Reinforcement Learning the Answer or the Question? A Brief Survey[J/OL]. [2022-06-11]. .
10	Tampuu A, Matiisen T, Kodelja D, et al. Multiagent Cooperation and Competition with Deep Reinforcement Learning[J]. Plos One(S1932-6203), 2017, 12(4): e0172395.
11	Gupta J K, Egorov M, Kochenderfer M. Cooperative Multi-agent Control Using Deep Reinforcement Learning[C]//International Conference on Autonomous Agents and Multiagent Systems. Cham: Springer, 2017: 66-83.
12	Foerster J N, Assael Y M, De Freitas N, et al. Learning to Communicate with Deep Multi-agent Reinforcement Learning[J]. [2022-06-11]. .
13	Sukhbaatar S, Fergus R. Learning Multi-agent Communication with Backpropagation[J]. Advances in Neural Information Processing Systems(S1049-5258), 2016, 29: 2244-2252.
14	Sunehag P, Lever G, Gruslys A, et al. Value-decomposition Networks for Cooperative Multi-agent Learning[J]. [2022-06-11] .
15	Foerster J, Nardelli N, Farquhar G, et al. Stabilising Experience Replay for Deep Multi-agent Reinforcement Learning[C]//International Conference on Machine learning. New York: PMLR, 2017: 1146-1155.
16	符小卫, 王辉, 徐哲. 基于DE-MADDPG的多无人机协同追捕策略研究[J]. 航空学报, 2022, 43(5): 325311.
	Fu Xiaowei, Wang Hui, Xu Zhe. Cooperative Pursuit Strategy for Multi-UAVs Based on DE-MADDPG Algorithm[J]. Acta Aeronauticaet Astronautica Sinica, 2022, 43(5): 325311.
17	Schaul T, Quan J, Antonoglou I, et al. Prioritized Experience Replay[J]. [2022-06-11]. .
18	Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]//Advances in Neural Information Processing Systems. San Francisco: Margan Kaufmann, 2017: 5998-6008.
19	Hu J, Shen L, Sun G. Squeeze-and-Excitation Networks[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7132-7141.
20	Iqbal S, Sha F. Actor-Attention-Critic for Multi-agent Reinforcement Learning[C]//International Conference on Machine Learning. New York: PMLR, 2019: 2961-2970.
21	Oh J, Chockalingam V, Lee H. Control of Memory, Active Perception, and Action in Minecraft[C]//International Conference on Machine Learning. New York: PMLR, 2016: 2790-2799.

[1]	Haotian Xu, Long Qin, Junjie Zeng, Yue Hu, Qi Zhang. Research Progress of Opponent Modeling Based on Deep Reinforcement Learning [J]. Journal of System Simulation, 2023, 35(4): 671-694.
[2]	Ying Xu, Shuai Zhang, Zhige Xie, Xinhai Xu, Manhui Sun, Ning Guo. A Simulation Method of Airborne Radar Real-time Detection Based on Three-dimensional Subdivision [J]. Journal of System Simulation, 2023, 35(2): 268-276.
[3]	Nan Xiang, Lu Wang, Chongliu Jia, Yuemou Jian, Xiaoxia Ma. Simulation of Occluded Pedestrian Detection Based on Improved YOLO [J]. Journal of System Simulation, 2023, 35(2): 286-299.
[4]	Hong Sun, Yuxiang Zhang, Yuelan Ling. Research on Image Super-resolution Reconstruction Based on Loss Extraction Feedback Attention Network [J]. Journal of System Simulation, 2023, 35(2): 308-317.
[5]	Baiyuan Ding, Fuling Mu, Yunpeng Li, Zhongkuan Chen, Chengyu Liu. Design of System Combat Simulation Platform for Complex Electromagnetic Environment [J]. Journal of System Simulation, 2023, 35(2): 330-338.
[6]	Jiajie Shi, Peng Yang, Yannan Pi. Machine Learning-based Simulation Research of On-line Subway Pedestrian Flow Control [J]. Journal of System Simulation, 2023, 35(2): 386-395.
[7]	Guohui Zhang, Xuan Wang, Yanan Zhang, Ang Gao. Research on Cooperative Path Planning Model of Multiple Unmanned Vehicles in Real Environment [J]. Journal of System Simulation, 2023, 35(2): 408-422.
[8]	Naiyang Xue, Dan Ding, Yutong Jia, Zhiqiang Wang, Yuan Liu. DQN-based Joint Scheduling Method of Heterogeneous TT&C Resources [J]. Journal of System Simulation, 2023, 35(2): 423-434.
[9]	Weidong Jin, Shuli Zhang, Peng Tang, Man Zhang. Image Dehazing Network Based on Densely Connected Residual Block and Channel Pixel Attention [J]. Journal of System Simulation, 2022, 34(8): 1663-1673.
[10]	Zheng Yang, Zhimin Xiang, Shiwen Ma. A Method of Loose Coupling Entity Modeling Based on Variable Rules [J]. Journal of System Simulation, 2022, 34(7): 1506-1511.
[11]	Junjie Qiu, Hong Zheng, Yunhui Cheng. Research on Prediction of Model Based on Multi-scale LSTM [J]. Journal of System Simulation, 2022, 34(7): 1593-1604.
[12]	Yin Wang, Feixiang Wang, Qianlai Sun. Vehicle Detection Method Based on Multi Scale Feature Fusion [J]. Journal of System Simulation, 2022, 34(6): 1219-1229.
[13]	Yejian Zhao, Yanhong Wang, Jun Zhang, Hongxia Yu, Zhongda Tian. Application of Improved Q Learning Algorithm in Job Shop Scheduling Problem [J]. Journal of System Simulation, 2022, 34(6): 1247-1258.
[14]	Sen Zhang, Mengyan Zhang, Jingping Shao, Jiexin Pu. Multi-UAVs 3D Path Planning Method Based on Random Strategy Search [J]. Journal of System Simulation, 2022, 34(6): 1286-1295.
[15]	Lingjia Ni, Xiaoxia Huang, Hongga Li, Zibo Zhang. Research on Fire Emergency Evacuation Simulation Based on Cooperative Deep Reinforcement Learning [J]. Journal of System Simulation, 2022, 34(6): 1353-1366.