基于DQN的海战场舰船路径规划及仿真

doi:10.16182/j.issn1004731x.joss.21-0229

系统仿真学报 ›› 2021, Vol. 33 ›› Issue (10): 2440-2448.doi: 10.16182/j.issn1004731x.joss.21-0229

基于DQN的海战场舰船路径规划及仿真

黄晓冬¹, 苑海涛², 毕敬^3,*, 刘涛⁴

1.海军航空大学,山东烟台 264001;
2.北京航空航天大学自动化科学与电气工程学院,北京 100191;
3.北京工业大学信息学部,北京100124;
4.北京交通大学软件学院,北京 100044

收稿日期:2021-03-19 修回日期:2021-04-15 出版日期:2021-10-18 发布日期:2021-10-18
通讯作者: 毕敬(1979-),女,博士,研究方向为计算智能、深度学习等。E-mail：bijing@bjut.edu.cn
作者简介:黄晓冬(1975-),男,博士后、教授,研究方向计算机软件、系统建模与仿真、人工智能应用等。E-mail：3065351527@qq.com
基金资助:
装备预研基金(41401020401,41401050102); 国家自然科学基金(62173013,62073005,61802015)

DQN-based Path Planning Method and Simulation for Submarine and Warship in Naval Battlefield

Huang Xiaodong¹, Yuan Haitao², Bi Jing^3,*, Liu Tao⁴

1. Naval Aeronautical University, Shandong 264001, China;
2. School of Automation Science and Electrical Engineering, Beihang University, Beijing 100191, China;
3. Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China;
4. School of Software Engineering, Beijing Jiaotong University, Beijing 100044, China

Received:2021-03-19 Revised:2021-04-15 Online:2021-10-18 Published:2021-10-18

摘要/Abstract

摘要： 为实现海战场环境下多智能体路径规划及目标追踪,以智能体(潜艇或者舰艇)为研究对象,提出一种基于强化学习的深度Q网络算法。通过设计两个结构相同但参数不同的神经网络,分别对其Q实际值和估计值的更新来实现价值函数的收敛。运用ε-贪婪算法设计动作选择机制和基于应用环境设计奖励函数,显著提高LER算法的更新速度和泛化能力等。仿真结果表明,与现有的路径规划算法和多智能体路径规划算法相比,每个智能体能够在陌生环境中有效躲避障碍物,并且通过一定步数的学习实现更加高效的智能规划路线及追踪目标。

关键词: 深度Q网络, 强化学习, 智能体, 路径规划, 目标追踪

Abstract: To realize multi-agent intelligent planning and target tracking in complex naval battlefield environment, the work focuses on agents (submarine or warship), and proposes a simulation method based on reinforcement learning algorithm called Deep Q Network (DQN). Two neural networks with the same structure and different parameters are designed to update real and predicted Q values for the convergence of value functions. An ε-greedy algorithm is proposed to design an action selection mechanism, and a reward function is designed for the naval battlefield environment to increase the update velocity and generalization ability of Learning with Experience Replay (LER). Simulation results show that compared with existing path routing algorithms and multi-agent path routing algorithms, each agent can effectively avoid obstacles in unfamiliar environments and achieve more intelligent path planning and target tracking through a certain number of steps of learning.

Key words: Deep Q network, reinforcement learning, multiple agents, path planning, target tracking

中图分类号:

黄晓冬, 苑海涛, 毕敬, 刘涛. 基于DQN的海战场舰船路径规划及仿真[J]. 系统仿真学报, 2021, 33(10): 2440-2448.

Huang Xiaodong, Yuan Haitao, Bi Jing, Liu Tao. DQN-based Path Planning Method and Simulation for Submarine and Warship in Naval Battlefield[J]. Journal of System Simulation, 2021, 33(10): 2440-2448.

参考文献

[1] 姜涛, 王建中, 施家栋. 小型移动机器人自主返航路径规划方法[J]. 计算机工程, 2015, 41(1): 164-168.
Jiang Tao, Wang Jianzhong, Shi Jiadong, Autonomous Return Path Planning for Small Mobile Robots[J]. Computer Engineering, 2015, 41(1): 164-168.
[2] 刘洁, 赵海芳, 周德廉. 一种改进量子行为粒子群优化算法的移动机器人路径规划[J]. 计算机科学, 2017, 44(增2): 123-128.
Liu Jie, Zhao Haifang, Zhou Delian.Path Planning of Mobile Robot Based on Improved Quantum Behavior Particle Swarm Optimization[J]. Computer Engineering, 2017, 44(S2): 123-128.
[3] 赵晓, 王铮, 黄程侃, 等.基于改进A*算法的移动机器人路径规划[J]. 机器人, 2018, 40(6): 903-910.
Zhao Xiao, Wang Zheng, Huang Chengkan, et al.Path Planning of Mobile Robot Based on Improved A* Algorithm[J]. Robot, 2018, 40(6): 903-910.
[4] 郭鹏, 余建波. 基于深度强化学习的制造过程Run-to-Run控制[[J/OL]]. 自动化学报, [2021-02-06].https://doi.org/10.16383/j.aas.c190546.
Guo Peng, Yu Jianbo, Run-to-Run Control of Manufacturing Process Based on Deep Reinforcement Learning[J]. Acta Automatica Sinica, [2021-02-06]. https://doi.org/10.16383/j.aas.c190546.
[5] Hasselt H V, Guez A, Silver D.Deep Reinforcement Learning with Double Q-learning[C]// Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16). Arizona, USA: AAAI, 2016: 2094-2100.
[6] 王大方. 基于深度强化学习的机器人导航研究[D]. 徐州: 中国矿业大学, 2019.
Wang Dafang.Research on Robot Navigation Based on Deep Reinforcement Learning[D]. Xuzhou: China University of Mining and Technology, 2019.
[7] 邓悟. 基于深度强化学习的智能体避障与路径规划研究与应用[D]. 成都: 电子科技大学, 2019.
Deng Wu.Research and Application of Agent Obstacle Avoidance and Path Planning Based on Deep Reinforcement Learning[D]. Chengdu: University of Electronic Science and Technology, 2019.
[8] 江其洲, 曾碧. 基于深度强化学习的移动机器人导航策略研究[J]. 计算机测量与控制, 2019, 27(8): 217-221.
Jiang Qizhou, Zeng Bi.Research on Navigation Strategy of Mobile Robot Based on Deep Reinforcement Learning[J]. Computer Measurement and Control, 2019, 27(8): 217-221.
[9] 张心怡, 张智鹏, 张铁赢, 等. RLO: 一个基于强化学习的连接优化方法[J]. 中国科学: 信息科学, 2020, 50(5): 637-648.
Zhang Xinyi, Zhang Zhipeng, Zhang Tieying, et al.RLO: a Reinforcement Learning-based Method for Join Optimization[J]. Scientia Sinica (Informationis), 2020, 50(5): 637-648.
[10] 乔俊飞, 侯占军, 阮晓钢. 基于神经网络的强化学习在避障中的应用[J]. 清华大学学报(自然科学版), 2008(增2): 1747-1750.
Qiao Junfei, Hou Zhanjun, Ruan Xiaogang.Application of Reinforcement Learning Based on Neural Network in Obstacle Avoidance[J]. Journal of Tsinghua University (Science and Technology), 2008(S2): 1747-1750.
[11] 王毅然, 经小川, 田涛, 等. 基于强化学习的多Agent路径规划方法研究[J]. 计算机应用与软件, 2019, 36(8): 165-171.
Wang Yiran, Jing Xiaochuan, Tian Tao, et al.Research on Multi-agent Path Planning Method Based on Reinforcement Learning[J]. Multi-agent Path Planning based on Reinforcement Learning, 2019, 36(8): 165-171.
[12] 高慧. 基于强化学习的移动机器人路径规划研究[D].成都: 西南交通大学, 2016.
Gao Hui.Mobile Robot Path Planning Based on Deep Reinforcement Learning[D]. Chengdu: Southwest Jiaotong University, 2016.
[13] 李鹤宇, 赵志龙, 顾蕾, 等. 基于深度强化学习的机械臂控制方法[J]. 系统仿真学报. 2019, 31(11): 2452-2457.
Li Heyu, Zhao Zhilong, Gu Lei, et al.Robot Arm Control Method Based on Deep Reinforcement Learning[J]. Journal of System Simulation, 2019, 31(11): 2452-2457.
[14] 周建频, 张姝柳. 基于深度强化学习的动态库存路径优化[J]. 系统仿真学报. 2019, 31(10): 2155-2163.
Zhou Jianpin, Zhang Shuliu.Dynamic Inventory Path Optimization Based on Deep Reinforcement Learning[J]. Journal of System Simulation, 2019, 31(10): 2155-2163.
[15] 闫丰亭, 贾金原. 基于深度学习序贯检验的电源车故障诊断方法[J]. 系统仿真学报, 2019, 31(1): 16-26.
Yan Fengting, Jia Jinyuan.Power Vehicle Fault Diagnosis Method based on Deep Learning Sequential Inspection[J]. Journal of System Simulation, 2019, 31(1): 16-26.
[16] 刘建伟, 高峰, 罗雄麟. 基于值函数和策略梯度的深度强化学习综述[J]. 计算机学报, 2019, 42(6): 1406-1438.
Liu Jianwei, Gao Feng, Luo Xionglin.Survey of Deep Reinforcement Learning Based on Value Function and Policy Gradient[J]. Chinese Journal of Computers, 2019, 42(6): 1406-1438.
[17] 徐志雄, 曹雷, 陈希亮, 等. 基于强化学习的无人坦克对战仿真研究[J]. 计算机工程与应用, 2018, 13(8): 266-272.
Xu Zhixiong, Cao Lei, Chen Xiliang.Research on the Simulation of Unmanned Tank Battle based on Reinforcement Learning[J]. Computer Engineering and Application, 2018, 13(8): 266-272.
[18] 刘全, 闫岩, 朱斐, 等. 一种带探索噪音的深度循环Q网络[J]. 计算机学报, 2019, 42(7): 1588-1604.
Liu Quan, Yan Yan, Zhu Fei, et al.A Deep Recurrent Q Network with Exploratory Noise[J]. Chinese Journal of Computers, 2019, 42(7): 1588-1604.
[19] Sutton R S, Barto A G.Reainforment Learning: An Introduction[M]. Cambridge: The MIT Press, 1998.
[20] 何柳柳, 杨羊, 李征, 等. 面向持续集成测试优化的强化学习奖励机制[J]. 软件学报, 2019, 30(5): 1438-1449.
He Liuliu, Yang Yang, Li Zheng, et al.Reward of Reinforcement Learning of Test Optimization for Continuous Integration[J]. Journal of Software, 2019, 30(5): 1438-1449.
[21] 杜威, 丁世飞. 多智能体强化学习综述[J]. 计算机科学, 2019, 46(8): 1-8.
Du Wei, Ding Shifei.Overview on Multi-agent Reinforcement Learning[J]. Computer Science, 2019, 46(8): 1-8.
[22] 李波. 基于分层强化学习的多Agent路径规划与编队方法研究[D]. 新乡: 河南师范大学. 2016.
Li Bo.Research on Multi-agent Path Planning and Formation Method based on Hierarchical Reinforcement Learning[D]. Xinxiang: Henan Normal University, 2016.

基于DQN的海战场舰船路径规划及仿真

DQN-based Path Planning Method and Simulation for Submarine and Warship in Naval Battlefield

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	陆淼嘉, 黄承媛, 滕靖. 基于多智能体的网购生鲜无人车配送调度仿真[J]. 系统仿真学报, 2022, 34(6): 1185-1195.
[2]	赵也践, 王艳红, 张俊, 于洪霞, 田中大. 改进Q学习算法在作业车间调度问题中的应用[J]. 系统仿真学报, 2022, 34(6): 1247-1258.
[3]	张森, 张孟炎, 邵敬平, 普杰信. 基于随机策略搜索的多机三维路径规划方法[J]. 系统仿真学报, 2022, 34(6): 1286-1295.
[4]	倪凌佳, 黄晓霞, 李红旮, 张子博. 基于协作式深度强化学习的火灾应急疏散仿真研究[J]. 系统仿真学报, 2022, 34(6): 1353-1366.
[5]	蒙盾, 胡卓, 张华军. 基于改进A^*算法的多层邮轮疏散系统仿真[J]. 系统仿真学报, 2022, 34(6): 1375-1382.
[6]	梁江涛, 王慧琴. 基于改进蚁群算法的建筑火灾疏散路径规划研究[J]. 系统仿真学报, 2022, 34(5): 1044-1053.
[7]	邓向阳, 张立民, 方伟, 汤淼. 基于双向汇聚引导蚁群算法的机器人路径规划[J]. 系统仿真学报, 2022, 34(5): 1101-1108.
[8]	李兆强, 张时雨. 基于快速RRT算法的三维路径规划算法研究[J]. 系统仿真学报, 2022, 34(3): 503-511.
[9]	王红微, 杨鹏. 基于深度强化学习的机场货运业务优化研究[J]. 系统仿真学报, 2022, 34(3): 651-660.
[10]	王霄汉, 张霖, 赖李媛君, 谢堃钰, 胡听春. 基于DEVS原子模型的智能体离散仿真构建方法[J]. 系统仿真学报, 2022, 34(2): 191-200.
[11]	李锋, 魏莹. 社会学习和参照点效应对企业产品决策的影响[J]. 系统仿真学报, 2022, 34(2): 234-246.
[12]	李启锐, 彭心怡. 基于深度强化学习的云作业调度及仿真研究[J]. 系统仿真学报, 2022, 34(2): 258-268.
[13]	王启明, 宗高强, 胥津铭. 多段式自动泊车最优路径规划与仿真分析[J]. 系统仿真学报, 2022, 34(2): 385-395.
[14]	吴曦, 孟祥林, 杨镜宇. 下一代战略博弈推演系统研究[J]. 系统仿真学报, 2021, 33(9): 2017-2024.
[15]	贺筱媛, 郭圣明, 吴琳, 李东, 许霄, 李丽. 面向智能化兵棋的认知行为建模方法研究[J]. 系统仿真学报, 2021, 33(9): 2037-2047.