Journal of System Simulation ›› 2026, Vol. 38 ›› Issue (5): 1277-1289.doi: 10.16182/j.issn1004731x.joss.25-0743

Previous Articles     Next Articles

Multi-agent Reinforcement Learning Method for Wargame Simulation Based on Suboptimal Demonstration Guidance

Zhou Zicong, Zeng Junjie, Hu Yue, Zhu Zhengqiu, Yin Quanjun   

  1. College of Systems Engineering, National University of Denfense Technology, Changsha 410073, China
  • Received:2025-08-03 Revised:2025-12-04 Online:2026-05-21 Published:2026-05-29
  • Contact: Zeng Junjie

Abstract:

To address issues such as fixed behavior patterns and insufficient adaptability in complex adversarial environments exhibited by traditional wargame agent decision-making models, this paper proposes a multi-agent reinforcement learning method based on suboptimal demonstrations (MARLSD). The proposed method integrates reward relabeling with a self-imitation learning mechanism, effectively improving the training efficiency of multi-agent reinforcement learning algorithms in environments with large state-action spaces and sparse rewards, even when only a small number of suboptimal demonstrations are available, while encouraging agents to explore better strategies. Experimental results show that, compared with baselines such as QMIX and MAGAIL, MARLSD significantly improves performance and training efficiency, adapts to various value-decomposition multi-agent reinforcement learning algorithms, and achieves strong results using only a small number of suboptimal demonstration trajectories.

Key words: suboptimal demonstration, sparse reward, self-imitation learning, wargame simulation, multi-agent reinforcement learning

CLC Number: