系统仿真学报 ›› 2024, Vol. 36 ›› Issue (1): 39-49.doi: 10.16182/j.issn1004731x.joss.22-0886

• 论文 • 上一篇    下一篇

基于深度强化学习的立体投送策略优化方法研究

安靖1,2,3(), 司光亚3(), 张雷1,2,3   

  1. 1.国防大学 联合勤务学院,北京 100858
    2.国防大学 研究生院,北京 100091
    3.国防大学 联合作战学院,北京 100091
  • 收稿日期:2022-08-02 修回日期:2022-09-27 出版日期:2024-01-20 发布日期:2024-01-19
  • 通讯作者: 司光亚 E-mail:anj21_2000@sina.com;sgy863@sina.com
  • 第一作者简介:安靖(1981-),女,副教授,博士,研究方向为军事运筹学、战争设计系统工程。E-mail:anj21_2000@sina.com

Strategy Optimization Method of Multi-dimension Projection Based on Deep Reinforcement Learning

An Jing1,2,3(), Si Guangya3(), Zhang Lei1,2,3   

  1. 1.Joint Logistics College, PLA National Defense University, Beijing 100858, China
    2.Graduate School, PLA National Defense University, Beijing 100091, China
    3.Joint Operations College, PLA National Defense University, Beijing 100091, China
  • Received:2022-08-02 Revised:2022-09-27 Online:2024-01-20 Published:2024-01-19
  • Contact: Si Guangya E-mail:anj21_2000@sina.com;sgy863@sina.com

摘要:

基于深度强化学习算法在策略优化问题中的良好表现,以立体投送作战行动为主要研究对象,提出了一种深度强化学习框架与仿真推演实验协同的作战行动策略优化 方法 。在分析策略优化研究现状的基础上,根据研究问题对深度学习框架进行了分析比较,构建了基于A3C算法的深度强化学习立体投送策略模型,并通过仿真推演和分布式计算,实现深度强化学习模型与“人不在回路”仿真推演的交互学习,获得优化后的立体投送策略,验证了深度强化学习框架与仿真推演实验协同优化策略的有效性。

关键词: 深度强化学习, 仿真推演, 策略优化, 立体投送, A3C算法

Abstract:

Based on the perfect performance of deep reinforcement learning (DRL) in strategy optimization, this paper proposes a strategy optimization method of action taking the multi-dimension projection action as the main research object. The method combines the simulation experiment method with the DRL method. After analyzing the current situation of strategy optimization research, the deep learning framework is selected according to the research problems, and a DRL multi-dimension projection strategy model based on the asynchronous advantage actor-critic (A3C) algorithm is constructed. Through simulation experiments, the interactive learning between the DRL model and the simulation of "out of the loop" is realized, and the optimized multi-dimension projection strategy is obtained. Finally, the effectiveness of the cooperative optimization strategy between the DRL framework and the simulation experiment is verified.

Key words: deep reinforcement learning (DRL), simulation, strategy optimization, multi-dimension projection, asynchronous advantage actor-critic (A3C) algorithm

中图分类号: