系统仿真学报 ›› 2021, Vol. 33 ›› Issue (10): 2440-2448.doi: 10.16182/j.issn1004731x.joss.21-0229

• 仿真建模理论与方法 • 上一篇    下一篇

基于DQN的海战场舰船路径规划及仿真

黄晓冬1, 苑海涛2, 毕敬3,*, 刘涛4   

  1. 1.海军航空大学,山东 烟台 264001;
    2.北京航空航天大学 自动化科学与电气工程学院,北京 100191;
    3.北京工业大学 信息学部,北京100124;
    4.北京交通大学 软件学院,北京 100044
  • 收稿日期:2021-03-19 修回日期:2021-04-15 出版日期:2021-10-18 发布日期:2021-10-18
  • 通讯作者: 毕敬(1979-),女,博士,研究方向为计算智能、深度学习等。E-mail:bijing@bjut.edu.cn
  • 作者简介:黄晓冬(1975-),男,博士后、教授,研究方向计算机软件、系统建模与仿真、人工智能应用等。E-mail:3065351527@qq.com
  • 基金资助:
    装备预研基金(41401020401,41401050102); 国家自然科学基金(62173013,62073005,61802015)

DQN-based Path Planning Method and Simulation for Submarine and Warship in Naval Battlefield

Huang Xiaodong1, Yuan Haitao2, Bi Jing3,*, Liu Tao4   

  1. 1. Naval Aeronautical University, Shandong 264001, China;
    2. School of Automation Science and Electrical Engineering, Beihang University, Beijing 100191, China;
    3. Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China;
    4. School of Software Engineering, Beijing Jiaotong University, Beijing 100044, China
  • Received:2021-03-19 Revised:2021-04-15 Online:2021-10-18 Published:2021-10-18

摘要: 为实现海战场环境下多智能体路径规划及目标追踪,以智能体(潜艇或者舰艇)为研究对象,提出一种基于强化学习的深度Q网络算法。通过设计两个结构相同但参数不同的神经网络,分别对其Q实际值和估计值的更新来实现价值函数的收敛。运用ε-贪婪算法设计动作选择机制和基于应用环境设计奖励函数,显著提高LER算法的更新速度和泛化能力等。仿真结果表明,与现有的路径规划算法和多智能体路径规划算法相比,每个智能体能够在陌生环境中有效躲避障碍物,并且通过一定步数的学习实现更加高效的智能规划路线及追踪目标。

关键词: 深度Q网络, 强化学习, 智能体, 路径规划, 目标追踪

Abstract: To realize multi-agent intelligent planning and target tracking in complex naval battlefield environment, the work focuses on agents (submarine or warship), and proposes a simulation method based on reinforcement learning algorithm called Deep Q Network (DQN). Two neural networks with the same structure and different parameters are designed to update real and predicted Q values for the convergence of value functions. An ε-greedy algorithm is proposed to design an action selection mechanism, and a reward function is designed for the naval battlefield environment to increase the update velocity and generalization ability of Learning with Experience Replay (LER). Simulation results show that compared with existing path routing algorithms and multi-agent path routing algorithms, each agent can effectively avoid obstacles in unfamiliar environments and achieve more intelligent path planning and target tracking through a certain number of steps of learning.

Key words: Deep Q network, reinforcement learning, multiple agents, path planning, target tracking

中图分类号: