系统仿真学报 ›› 2023, Vol. 35 ›› Issue (4): 786-796.doi: 10.16182/j.issn1004731x.joss.21-1321

• 论文 • 上一篇    

强化学习驱动的海战场多智能体协同作战仿真算法

石鼎(), 燕雪峰, 宫丽娜, 张静宣, 关东海, 魏明强()   

  1. 南京航空航天大学 计算机科学与技术学院,江苏 南京 211100
  • 收稿日期:2021-12-20 修回日期:2022-03-01 出版日期:2023-04-29 发布日期:2023-04-12
  • 通讯作者: 魏明强 E-mail:shiding0614@163.com;mqwei@nuaa.edu.cn
  • 作者简介:石鼎(1996-),男,硕士生,研究方向为多智能体强化学习。E-mail:shiding0614@163.com
  • 基金资助:
    国家自然科学基金面上项目(62172218)

Multi-agent Cooperative Combat Simulation in Naval Battlefield with Reinforcement Learning

Ding Shi(), Xuefeng Yan, Lina Gong, Jingxuan Zhang, Donghai Guan, Mingqiang Wei()   

  1. School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211100, China
  • Received:2021-12-20 Revised:2022-03-01 Online:2023-04-29 Published:2023-04-12
  • Contact: Mingqiang Wei E-mail:shiding0614@163.com;mqwei@nuaa.edu.cn

摘要:

未来海战场形势瞬息万变,亟需依托人工智能技术实现对海战场环境的高质量作战仿真,以全面优化和提升我军战斗力,达成克敌制胜的目的。作战单元的协同合作是实现海战场作战仿真的关键环节,如何实现多智能体之间的均衡决策是作战仿真首要解决的问题。基于解耦的优先经验回放机制和注意力机制,提出强化学习驱动的多智能体协同作战仿真算法(multi-agent reinforcement learning-based cooperative combat simulation,MARL-CCSA)。在MARL-CCSA基础上,利用专家经验,设计一种多尺度奖励函数,并基于此函数构建一个海战场作战仿真环境,使MARL-CCSA在此环境中训练易于收敛。设计想定进行仿真实验,并与其他算法的效果进行对比,验证MARL-CCSA的可行性与实用性。

关键词: 作战仿真, 协同工作, 强化学习, 优先经验回放, 注意力机制, 多尺度奖励函数

Abstract:

Due to the rapidly-changed situations of future naval battlefields, it is urgent to realize the high-quality combat simulation in naval battlefields based on artificial intelligence to comprehensively optimize and improve the combat effectiveness of our army and defeat the enemy. The collaboration of combat units is the key point and how to realize the balanced decision-making among multiple agents is the first task. Based on decoupling priority experience replay mechanism and attention mechanism, a multi-agent reinforcement learning-based cooperative combat simulation (MARL-CCSA) network is proposed. Based on the expert experience, a multi-scale reward function is designed, on which a naval battlefield combat simulation environment is constructed. The proposed multi-scale reward function could speedthe convergence of multiple agents. The feasibility and practicability of MARL-CCSA is verified by the simulation experiment and the comparison with the other methods.

Key words: combat simulation, collaboration, reinforcement learning, prioritized experience replay, attention mechanism, multi-scale reward function

中图分类号: