系统仿真学报 ›› 2025, Vol. 37 ›› Issue (4): 875-881.doi: 10.16182/j.issn1004731x.joss.23-1524

• 论文 • 上一篇    下一篇

改进型深度确定性策略梯度的无人机路径规划

张森, 代强强   

  1. 河南科技大学 信息工程学院,河南 洛阳 471023
  • 收稿日期:2023-12-13 修回日期:2024-01-17 出版日期:2025-04-17 发布日期:2025-04-16
  • 第一作者简介:张森(1984-),男,副教授,博士,研究方向为先进机器人与智能控制技术。
  • 基金资助:
    国家自然科学基金(62271193);河南省自然科学基金(222300420433)

UAV Path Planning Based on Improved Deep Deterministic Policy Gradients

Zhang Sen, Dai Qiangqiang   

  1. College of Information Engineering, Henan University of Science and Technology, Luoyang 471023, China
  • Received:2023-12-13 Revised:2024-01-17 Online:2025-04-17 Published:2025-04-16

摘要:

针对无人机在复杂环境下进行路径规划时,存在收敛性差和无效探索等问题,提出一种改进型深度确定性策略梯度(deep deterministic policy gradient,DDPG)算法。采用双经验池机制,分别存储成功经验和失败经验,算法能够利用成功经验强化策略优化,并从失败经验中学习避免错误路径;引入人工势场法为规划增加引导项,与随机采样过程中的探索噪声动作相结合,对所选动作进行动态整合;通过设计组合奖励函数,采用方向、距离、障碍躲避及时间奖励函数实现路径规划的多目标优化,并解决奖励稀疏问题。实验结果表明:该算法的奖励和成功率能够得到显著提高,且能够在更短的时间内达到收敛。

关键词: 无人机, 深度强化学习, 路径规划, 深度确定性策略梯度, 人工势场法

Abstract:

Aiming at the problems of poor convergence and invalid exploration when UAVs perform path planning in complex environments, an improved deep deterministic policy gradient(DDPG) algorithm is proposed. Using a dual experience pooling mechanism to store success and failure experiences separately, the algorithm is able to use the success experience to strengthen the strategy optimization and learn from the failure experience to avoid the wrong path;an APF method is introduced to add a bootstrap term to the planning, which is combined with the exploration of noisy actions in a randomized sampling process to dynamically integrate the selected actions;multi-objective optimization of path planning is achieved by designing combinatorial reward functions using direction, distance, obstacle avoidance and time reward functions and solving the reward sparsity problem. Experiments show that the proposed algorithm can significantly improve the reward and success rate and reach convergence in a shorter time.

Key words: UAV, DRL, path planning, deep deterministic policy gradient(DDPG), APF

中图分类号: