系统仿真学报 ›› 2025, Vol. 37 ›› Issue (3): 584-594.doi: 10.16182/j.issn1004731x.joss.24-0098

• 论文 • 上一篇    

基于在线强化学习算法的救护车智能调控模型

张雷1,2, 张雪超2, 王超2, 薄祥雷3   

  1. 1.国防大学 联合作战学院,北京 100000
    2.国防大学 联合勤务学院,北京 100858
    3.军事交通学院 汽车士官学校 安徽 蚌埠 233011
  • 收稿日期:2024-01-24 修回日期:2024-02-05 出版日期:2025-03-17 发布日期:2025-03-21
  • 第一作者简介:张雷(1986-),女,博士生,研究方向为军事运筹学。
  • 基金资助:
    全军军事类研究生资助课题(JY2022B011)

An Intelligent Ambulance Regulation Model Based on Online Reinforcement Learning Algorithm

Zhang Lei1,2, Zhang Xuechao2, Wang Chao2, Bo Xianglei3   

  1. 1.Joint Operations College, National Defence University, Beijing 100000, China
    2.Joint Logistics College, National Defence University, Beijing 100858, China
    3.Automobile NCO Academy, Army Military Transportation University of PLA, Bengbu 233011, China
  • Received:2024-01-24 Revised:2024-02-05 Online:2025-03-17 Published:2025-03-21

摘要:

在利用救护车开展伤员后送的应急场景中,需要充分协调救护车的救援能力和场景中伤员的实时状态才能取得最佳的救援效果。此类问题一般是非确定性多项式问题,采用传统的确定性调度算法效果较差,针对此场景建立了一套在线强化学习DQN算法框架,并训练了对应的智能体用于实时在线调度。为解决应急场景可重复性差,学习样本积累速度低导致智能体训练缓慢的问题,在传统DQN算法的基础上提出了结合数据增强方法的DA-DQN方法。结果表明:几种经典的DQN方法都可以在线训练获得一个智能体,取得比确定性算法更优的调度效果。经典“先到先得”算法调度取得的救治失败率大约为45.4%,而DQN智能体收敛后的救治失败率大约为25%,且DA-DQN方法的智能体训练速度远快于传统DQN类方法,展现了此方法用于实际应急救援场景的潜力。

关键词: 应急场景, 救护车后送, 在线强化学习, 数据增强, 行动调控优化

Abstract:

In emergency scenarios where ambulances are used to evacuate casualties, it is necessary to fully coordinate the rescue capability of the ambulance with the real-time status of the casualties in the scenario to achieve the best rescue results. Such problems are generally non-deterministic polynomial problems, and the traditional deterministic scheduling algorithms are less effective. This paper aimed at the modeling research of the real-time regulation of ambulances in emergency scenarios, an online reinforcement learning DNQ algorithm frameworks based on the data enhancement method is proposed and applied to the solution of the ambulances control model. To solve the problems of poor repeatability in emergency scenarios and slow training of the agent due to the low accumulation speed of learning samples, a DA-DQN method combining the data augmentation method on the basis of the traditional DQN algorithm is proposed. Simulation results show that several classical DQN methods can be trained online to obtain agent, achieving better scheduling results than deterministic algorithms. The treatment failure rate achieved by classical "first-come, first-served" algorithm scheduling is about 45.4%, while the medical failure rate after DQN agent convergence is about 25%.The agent training speed of DA-DQN method is much faster than that of traditional DQN method.Moreover, it has practical application potential in emergency rescue operation regulation.

Key words: emergency scenario, ambulance evacuation, online reinforcement learning, data augmentation, action control optimization

中图分类号: