Research and Development of Simulation Training Platform for Multi-agent Collaborative Decision-making

doi:10.16182/j.issn1004731x.joss.23-FZ0821

Abstract

Abstract:

Reinforcement learning simulation platform can be an interactive and training environment for reinforcement learning. In order to make the simulation platform compatible with the multi-agent reinforcement learning algorithms and meet the needs of simulation in military field, the similar processes in multi-agent reinforcement learning algorithms are refined and a unified interface is designed to embed and verify different types of deep reinforcement learning algorithms on the simulation platform and to optimize the back-end service of the simulation platform to accelerate the training process of the algorithm model. The experimental results show that, by unifing the interface, the simulation platform can be compatible with many different types of multi-agent reinforcement learning algorithms, and the algorithm training efficiency can be significantly improved after the back-end service framework reconstruction and parameter quantization.

Key words: artificial intelligence, multi-agent, reinforcement learning, virtual simulation, training acceleration

CLC Number:

TP391.9

Cheng Cheng, Chen Zhijie, Guo Ziming, Li Ni. Research and Development of Simulation Training Platform for Multi-agent Collaborative Decision-making[J]. Journal of System Simulation, 2023, 35(12): 2669-2679.

Figures/Tables 17

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Fig. 6

Table 1

Table 2

Fig. 7

Fig. 8

Fig. 9

Fig. 10

Table 3

Fig. 11

Fig. 12

Fig. 13

Fig. 14

References 11

1	邹启杰, 蒋亚军, 高兵, 等. 协作多智能体深度强化学习研究综述[J]. 航空兵器, 2022, 29(6): 78-88.
	Zou Qijie, Jiang Yajun, Gao Bing, et al. An Overview of Cooperative Multi-agent Deep Reinforcement Learning[J]. Aero Weaponry, 2022, 29(6): 78-88.
2	Christiano P F, Leike J, Brown T W, et al. Deep Reinforcement Learning from Human Preferences[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY, USA: Curran Associates Inc., 2017: 4302-4310.
3	Brockman G, Cheung V, Pettersson L, et al. OpenAI Gym[EB/OL]. (2016-06-05) [2023-05-21]. .
4	Alibaba. Gym StarCraft[EB/OL]. [2023-05-16]. .
5	周亮, 王震, 王冠. 远程过程调用技术在分布式软件系统中的应用[J]. 航空电子技术, 2020, 51(4): 47-52.
	Zhou Liang, Wang Zhen, Wang Guan. Application of Remote Procedure Calling Technology in Distributed Software System[J]. Avionics Technology, 2020, 51(4): 47-52.
6	张红杰. 深度强化学习训练与推理的性能优化[D]. 合肥: 中国科学技术大学, 2021.
	Zhang Hongjie. Performance Optimization of Training and Inference of Deep Reinforcement Learning[D]. Hefei: University of Science and Technology of China, 2021.
7	Abadi Martín, Agarwal A, Barham P, et al. TensorFlow: Large-scale Machine Learning on Heterogeneous Distributed Systems[EB/OL]. (2016-03-16) [2023-04-23]. .
8	Lowe R, Wu Yi, Tamar A, et al. Multi-agent Actor-critic for Mixed Cooperative-competitive Environments[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY, USA: Curran Associates Inc., 2017, 30: 6382-6393.
9	Konda V R, Tsitsiklis J N. Actor-critic Algorithms[C]//Advances in Neural Information Processing Systems. Cambridge: MIT Press, 2000: 1008-1014.
10	Terry J, Black B, Grammel N, et al. PettingZoo: Gym for Multi-agent Reinforcement Learning[C]//Advances in Neural Information Processing Systems. Red Hook, NY, USA: Curran Associates, Inc., 2021: 15032-15043.
11	王壮, 艾毅, 文旭光, 等. 航空器智能引导机动决策奖励重塑方法[J]. 科学技术与工程, 2023, 23(8): 3535-3543.
	Wang Zhuang, Ai Yi, Wen Xuguang, et al. Reward Shaping for Intelligent Maneuver Decision Generation in Aircraft Guidance[J]. Science Technology and Engineering, 2023, 23(8): 3535-3543.

序号	原同步框架训练耗时	优化后框架训练耗时
平均值	83.75	59.40
1	83.98	59.41
2	83.56	59.21
3	83.71	59.59

智能体类型	算法模型	奖励设计	无人机/ 舰船数量
无人机	MADDPG	势函数方法设计	2~4
舰船	―	―	1~2

[1]	Zhou Zicong, Zeng Junjie, Hu Yue, Zhu Zhengqiu, Yin Quanjun. Multi-agent Reinforcement Learning Method for Wargame Simulation Based on Suboptimal Demonstration Guidance [J]. Journal of System Simulation, 2026, 38(5): 1277-1289.
[2]	Li Guozheng, Wang Rui, Fan Shichao, Cai Xintong, Zhai Xinyue. Review on Optimization of Simulation Modeling Strategies for Spacecraft Orbit Avoidance [J]. Journal of System Simulation, 2026, 38(4): 855-868.
[3]	Li Dequan, Xiong Wan. Robot Path Planning by Reinforcement Learning Based on SAC3Q-HDM [J]. Journal of System Simulation, 2026, 38(3): 714-724.
[4]	Li Jiting, Sun Yi, Wang Yirong, Lin Yiqin, Jia Jun, Ding Gangsong. LLM-driven Multi-agent Social Network Simulation: Interdisciplinary Integration and Cutting-edge Development [J]. Journal of System Simulation, 2026, 38(2): 235-260.
[5]	Yang Can, Chen Kai, Zhu Feng. Reinforcement Learning Based Method for UAV Team Orienteering Optimization under Multi-constraint Condition [J]. Journal of System Simulation, 2026, 38(2): 360-371.
[6]	Yan Qiang, Zhang Qianyu, Wei Na. Evolutionary Game-based Analysis of Responses to Hallucinations in Generative Artificial Intelligence [J]. Journal of System Simulation, 2026, 38(2): 399-415.
[7]	Zheng Wei, Tang Jiahao, Xiong Xiaoping, Fan Xin. Intelligent Decision-making Method in Imbalanced Air Combat Based on Asymmetric Self-play [J]. Journal of System Simulation, 2026, 38(2): 433-446.
[8]	Ding Zhengkun, Liu Jiaqi, Xu Junzheng, Xu Yuezhu, Wang Xingmei. Intelligent Air Combat Decision-making Method Based on BiGRU and Priority Dynamic Sampling [J]. Journal of System Simulation, 2026, 38(2): 447-459.
[9]	Tao Caixia, Chen Naikun, Gao Fengyang, Zhang Jiangang. Distributed Optimization for Integrated Energy Based on Multi-agent Reinforcement Learning [J]. Journal of System Simulation, 2026, 38(2): 476-487.
[10]	Wang Yifan, Yang Bin, Wang Congjun. Simulation Method for Multi-crew Construction Processes Based on Large Language Model-powered Agent [J]. Journal of System Simulation, 2026, 38(2): 488-500.
[11]	Jiang Ming, He Tao. Solving the Vehicle Routing Problem Based on Deep Reinforcement Learning [J]. Journal of System Simulation, 2025, 37(9): 2177-2187.
[12]	Ni Peilong, Mao Pengjun, Wang Ning, Yang Mengjie. Robot Path Planning Based on Improved A-DDQN Algorithm [J]. Journal of System Simulation, 2025, 37(9): 2420-2430.
[13]	Sun Bo, Zheng Kai. Digital Testing and Evaluation: Current Status, Challenges, and Prospects [J]. Journal of System Simulation, 2025, 37(8): 1885-1906.
[14]	Chen Zhen, Wu Zhuoyi, Zhang Lin. Research on Policy Representation in Deep Reinforcement Learning [J]. Journal of System Simulation, 2025, 37(7): 1753-1769.
[15]	Zhang Yue, Zhang Wenliang, Feng Qiang, Guo Xing, Ren Yi, Wang Zili. Combat-oriented Comprehensive Simulation and Verification Technology for Equipment System RMS [J]. Journal of System Simulation, 2025, 37(7): 1823-1835.