系统仿真学报 ›› 2024, Vol. 36 ›› Issue (7): 1525-1535.doi: 10.16182/j.issn1004731x.joss.23-0477

• 研究论文 • 上一篇    

作战方案驱动的可学习兵棋推演智能体研究

孙怡峰1(), 李智1, 吴疆1, 王玉宾2   

  1. 1.战略支援部队信息工程大学,河南 郑州 450001
    2.中国人民解放军66389部队,河南 郑州 450000
  • 收稿日期:2023-04-21 修回日期:2023-06-12 出版日期:2024-07-15 发布日期:2024-07-12
  • 第一作者简介:孙怡峰(1976-),男,副教授,博士,研究方向为人工智能与信息安全等。E-mail:yfsun001@163.com

Research on Learnable Wargame Agent Driven by Battle Scheme

Sun Yifeng1(), Li Zhi1, Wu Jiang1, Wang Yubin2   

  1. 1.Strategic Support Force Information Engineering University, Zhengzhou 450001, China
    2.PLA 66389 Troops, Zhengzhou 450000, China
  • Received:2023-04-21 Revised:2023-06-12 Online:2024-07-15 Published:2024-07-12

摘要:

为了使智能体能够应对兵棋推演中的复杂作战场景和作战目的,提出作战方案驱动的可学习兵棋推演智能体架构。剖析智能体对兵棋系统的“依附特性”和“松耦合特性”,得到智能体的可学习要求;在智能体框架设计中,使用作战方案压减智能体学习范围。通过有限状态机对应作战方案中的作战阶段知识,依据作战方案框架确定智能体决策空间,设计可学习的深层神经网络实施关键决策空间探索,神经网络采用先验知识模仿学习模式和深度强化学习模式。该架构能迭代探索人类难以充分梳理清楚的多棋子最优部署和协作问题。

关键词: 兵棋推演, 智能体, 作战方案, 深层神经网络, 强化学习, 模仿学习

Abstract:

To enable the agent to cope with complex battle scenarios and objectives in wargame, a learnable wargame agent architecture driven by a battle scheme is proposed. By analyzing the "attachment characteristics" and "loose coupling characteristics" of the agent to wargame system, the learnable requirements of the agent are obtained. In the design of the agent framework, battle schemes are used to reduce the learning range of the agent. The finite state machine corresponds to the knowledge of the operational phase in the battle scheme, and the decision-making space of the agent is determined according to the framework of the battle scheme. A learnable deep neural network is designed to explore key decision space. The neural network uses prior knowledge imitation learning mode and deep reinforcement learning mode. This architecture can iteratively explore optimal deployment and collaboration issues for multiple chessmen that are difficult for humans to fully tease out.

Key words: wargame, agent, battle scheme, deep neural network, reinforcement learning, imitation learning

中图分类号: