系统仿真学报 ›› 2026, Vol. 38 ›› Issue (2): 372-386.doi: 10.16182/j.issn1004731x.joss.25-0486

• 机器学习算法 • 上一篇    

基于改进近端策略优化算法的无人车打击策略规划方法

王秉坤1, 王越1, 杨妹2, 张鹏年1, 樊浡昊1, 唐杰1   

  1. 1.西北机电工程研究所,陕西 咸阳 712099
    2.国防科技大学 智能科学学院,湖南 长沙 410073
  • 收稿日期:2025-05-28 修回日期:2025-07-29 出版日期:2026-02-18 发布日期:2026-02-11
  • 通讯作者: 王越
  • 第一作者简介:王秉坤(2001-),男,硕士生,研究方向为无人车智能规划与决策。
  • 基金资助:
    咸阳市科技创新人才计划(L2024CXNLKJRCTDKJRC0005)

Strike Strategy Planning Method of Unmanned Ground Vehicles Based on Improved PPO Algorithm

Wang Bingkun1, Wang Yue1, Yang Mei2, Zhang Pengnian1, Fan Bohao1, Tang Jie1   

  1. 1.Northwest Institute of Mechanical & Electrical Engineering, Xianyang 712099, China
    2.College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China
  • Received:2025-05-28 Revised:2025-07-29 Online:2026-02-18 Published:2026-02-11
  • Contact: Wang Yue

摘要:

针对无人战车预设打击规则无法最大化打击命中率,以及连续运动规划与离散打击决策难以耦合优化问题,提出一种基于混合动作空间和GRU的改进PPO算法。建立无人车打击任务过程中的环境模型、目标模型,以及融合运动学约束、态势感知和动态决策的三级架构无人车模型;使用2个不同的策略网络用于路径规划的连续运动规划网络,求解打击位置及目标序列选择过程中打击决策问题的离散打击决策网络;引入GRU处理无人车决策过程因状态部分可观测需要依赖历史观察推断当前状态的问题。仿真结果表明:该方法能耦合优化无人车路径规划与打击决策问题,提升了无人车自主执行打击任务的能力。

关键词: 深度强化学习, 无人车, 路径规划, 打击决策, 近端策略优化

Abstract:

An improved PPO algorithm based on the hybrid action space and gated recurrent unit (GRU) is proposed to address the limitations of predefined strike rules in maximizing the hitting accuracy of unmanned ground vehicles and the difficult coupling and optimization of continuous motion planning and discrete strike decision-making. The environmental model and target model are built for the process of unmanned ground vehicles' strike missions, coupled with a three-layer model for unmanned ground vehicles that fuses kinematic constraints, situational awareness, and dynamic decision-making. Two distinct policy networks are employed, including the continuous motion planning network for path planning, and the discrete strike decision-making network for solving the strike decision-making problems in the process of strike location and target sequence selection. A GRU module is introduced to address the partially observable nature of the environment by inferring current states from historical observations. The simulation results show that this method can couple and optimize the path planning and strike decision-making of unmanned ground vehicles, improving the ability of unmanned ground vehicles to autonomously perform strike missions.

Key words: DRL, unmanned ground vehicle, path planning, strike decision-making, PPO

中图分类号: