改进型深度确定性策略梯度的无人机路径规划

doi:10.16182/j.issn1004731x.joss.23-1524

摘要/Abstract

摘要：

针对无人机在复杂环境下进行路径规划时，存在收敛性差和无效探索等问题，提出一种改进型深度确定性策略梯度(deep deterministic policy gradient，DDPG)算法。采用双经验池机制，分别存储成功经验和失败经验，算法能够利用成功经验强化策略优化，并从失败经验中学习避免错误路径；引入人工势场法为规划增加引导项，与随机采样过程中的探索噪声动作相结合，对所选动作进行动态整合；通过设计组合奖励函数，采用方向、距离、障碍躲避及时间奖励函数实现路径规划的多目标优化，并解决奖励稀疏问题。实验结果表明：该算法的奖励和成功率能够得到显著提高，且能够在更短的时间内达到收敛。

关键词: 无人机, 深度强化学习, 路径规划, 深度确定性策略梯度, 人工势场法

Abstract:

Aiming at the problems of poor convergence and invalid exploration when UAVs perform path planning in complex environments, an improved deep deterministic policy gradient(DDPG) algorithm is proposed. Using a dual experience pooling mechanism to store success and failure experiences separately, the algorithm is able to use the success experience to strengthen the strategy optimization and learn from the failure experience to avoid the wrong path;an APF method is introduced to add a bootstrap term to the planning, which is combined with the exploration of noisy actions in a randomized sampling process to dynamically integrate the selected actions;multi-objective optimization of path planning is achieved by designing combinatorial reward functions using direction, distance, obstacle avoidance and time reward functions and solving the reward sparsity problem. Experiments show that the proposed algorithm can significantly improve the reward and success rate and reach convergence in a shorter time.

Key words: UAV, DRL, path planning, deep deterministic policy gradient(DDPG), APF

中图分类号:

TP273

张森,代强强 . 改进型深度确定性策略梯度的无人机路径规划[J]. 系统仿真学报, 2025, 37(4): 875-881.

Zhang Sen,Dai Qiangqiang . UAV Path Planning Based on Improved Deep Deterministic Policy Gradients[J]. Journal of System Simulation, 2025, 37(4): 875-881.

图/表 7

图1

表1

训练参数设置

参数	数值	参数	数值
学习率	0.001	$k 1$	0.3
折扣因子	0.9	$k 2$	10
$E m a x$	1 000	$k 3$	0.3
最大步长	500	$k 4$	15
$P s$	100 000	$k 5$	5
$P d$	100 000	$k 6$	50
$η$	0.02	$k 7$	1

表1

图2

图3

图4

图5

图6

参考文献 17

1	Zhang Ming, Li Wei, Wang Mengmeng, et al. Helicopter-UAVs Search and Rescue Task Allocation Considering UAVs Operating Environment and Performance[J]. Computers & Industrial Engineering, 2022, 167: 107994.
2	Xing Linjie, Fan Xiaoyan, Dong Yaxin, et al. Multi-UAV Cooperative System for Search and Rescue Based on YOLOv5[J]. International Journal of Disaster Risk Reduction, 2022, 76: 102972.
3	Raptis Emmanuel K, Krestenitis Marios, Egglezos Konstantinos, et al. End-to-end Precision Agriculture UAV-based Functionalities Tailored to Field Characteristics[J]. Journal of Intelligent & Robotic Systems, 2023, 107(2): 23.
4	Zhou Maowu, Chen Hongbin, Shu Lei, et al. UAV-assisted Sleep Scheduling Algorithm for Energy-efficient Data Collection in Agricultural Internet of Things[J]. IEEE Internet of Things Journal, 2022, 9(13): 11043-11056.
5	Asadzadeh Saeid, Wilson José de Oliveira, Carlos Roberto de Souza Filho. UAV-based Remote Sensing for the Petroleum Industry and Environmental Monitoring: State-of-the-art and Perspectives[J]. Journal of Petroleum Science and Engineering, 2022, 208, Part D: 109633.
6	Lee H W. Research on Multi-functional Logistics Intelligent Unmanned Aerial Vehicle[J]. Engineering Applications of Artificial Intelligence, 2022, 116: 105341.
7	Lu Ziyi, Yu Na, Wang Xuehe. Incentive Mechanism and Path Planning for Unmanned Aerial Vehicle (UAV) Hitching over Traffic Networks[J]. Future Generation Computer Systems, 2023, 145: 521-535.
8	Lippiello Vincenzo, Cacace Jonathan. Robust Visual Localization of a UAV over a Pipe-rack Based on the Lie Group SE(3)[J]. IEEE Robotics and Automation Letters, 2022, 7(1): 295-302.
9	Liu Kangcheng, Chen B M. Industrial UAV-based Unsupervised Domain Adaptive Crack Recognitions: From Database Towards Real-site Infrastructural Inspections[J]. IEEE Transactions on Industrial Electronics, 2023, 70(9): 9410-9420.
10	Chai Xuzhao, Zheng Zhishuai, Xiao Junming, et al. Multi-strategy Fusion Differential Evolution Algorithm for UAV Path Planning in Complex Environment[J]. Aerospace Science and Technology, 2022, 121: 107287.
11	Yan Yuehao, Zhiying Lü, Yuan Jinbiao, et al. Obstacle Avoidance for Multi-UAV System with Optimized Artificial Potential Field Algorithm[J]. International Journal of Robotics and Automation, 2021, 36: 1-7.
12	Zu Linan, Wang Zhipeng, Liu Cong, et al. Research on UAV Path Planning Method Based on Improved HPO Algorithm in Multitask Environment[J]. IEEE Sensors Journal, 2023, 23(17): 19881-19893.
13	Xue Yuntao, Chen Weisheng. A UAV Navigation Approach Based on Deep Reinforcement Learning in Large Cluttered 3D Environments[J]. IEEE Transactions on Vehicular Technology, 2023, 72(3): 3001-3014.
14	Zhao Jinduo, Gan Zhigao, Liang Jiakai, et al. Path Planning Research of a UAV Base Station Searching for Disaster Victims' Location Information Based on Deep Reinforcement Learning[J]. Entropy, 2022, 24(12): 1767.
15	Zhang Sitong, Li Yibing, Ye Fang, et al. A Hybrid Human-in-the-loop Deep Reinforcement Learning Method for UAV Motion Planning for Long Trajectories with Unpredictable Obstacles[J]. Drones, 2023, 7(5): 311.
16	张云燕, 魏瑶, 刘昊, 等. 基于深度强化学习的端到端无人机避障决策[J]. 西北工业大学学报, 2022, 40(5): 1055-1064.
	Zhang Yunyan, Wei Yao, Liu Hao, et al. End-to-end UAV Obstacle Avoidance Decision Based on Deep Reinforcement Learning[J]. Journal of Northwestern Polytechnical University, 2022, 40(5): 1055-1064.
17	文超, 董文瀚, 解武杰, 等. 基于解耦型MADDPG的无人机集群自主跟踪与避障[J]. 飞行力学, 2022, 40(6): 24-31.
	Wen Chao, Dong Wenhan, Xie Wujie, et al. Autonomous Tracking and Obstacle Avoidance of UAV Swarms Based on Decomposed MADDPG[J]. Flight Dynamics, 2022, 40(6): 24-31.

[1]	江明, 何韬. 基于深度强化学习的带容量约束车辆路径问题求解[J]. 系统仿真学报, 2025, 37(9): 2177-2187.
[2]	于逸然, 赖惠成, 高古学, 张过, 彭汪忆楠, 杨龙飞, 黄俊豪. 基于遗传算法和A^*算法的多农机协同作业优化方法[J]. 系统仿真学报, 2025, 37(9): 2397-2408.
[3]	倪培龙, 毛鹏军, 王宁, 杨孟杰. 基于改进A-DDQN算法的机器人路径规划[J]. 系统仿真学报, 2025, 37(9): 2420-2430.
[4]	张凯翔, 毛剑琳, 王妮娅, 徐志昊. 针对路径干扰的多机器人分层协作k鲁棒路径规划[J]. 系统仿真学报, 2025, 37(8): 2074-2088.
[5]	江好胜, 武芳芳, 黄泽贤, 马子玥, 董春云, 平续斌. 动态障碍物环境下多四旋翼轨迹规划与跟踪[J]. 系统仿真学报, 2025, 37(8): 2089-2102.
[6]	陈真, 吴卓屹, 张霖. 深度强化学习中策略表征研究简述[J]. 系统仿真学报, 2025, 37(7): 1753-1769.
[7]	万宇航, 朱子璐, 钟春富, 刘永奎, 林廷宇, 张霖. 基于改进PPO算法的机械臂动态路径规划[J]. 系统仿真学报, 2025, 37(6): 1462-1473.
[8]	叶晨, 邵鹏, 张少平, 李文婷, 周腾明. 面向移动机器人路径规划的增强型人工大猩猩算法[J]. 系统仿真学报, 2025, 37(6): 1474-1485.
[9]	张艳, 李炳华, 霍涛, 刘榕. 融合改进A*算法与DWA算法的机器人动态避障方法研究[J]. 系统仿真学报, 2025, 37(6): 1555-1564.
[10]	伍国华, 曾家恒, 王得志, 郑龙, 邹伟. 基于深度强化学习的四旋翼航迹跟踪控制方法[J]. 系统仿真学报, 2025, 37(5): 1169-1187.
[11]	屈长虹, 王俊杰, 王坤, 崔清勇, 陈蒋洋, 王鑫鹏. 基于联合DQN的定向能系统火力智能决策建模仿真方法[J]. 系统仿真学报, 2025, 37(5): 1256-1265.
[12]	周晓晖, 李研强, 王勇, 赵德财, 杨逍瑶. 基于双启发式信息蚁群算法的机器人路径规划[J]. 系统仿真学报, 2025, 37(5): 1280-1289.
[13]	喻蝶, 鲍柏仲, 司言, 段暕, 詹小斌, 史铁林. 基于搜索步优化A*算法的移动机器人路径规划[J]. 系统仿真学报, 2025, 37(4): 1041-1050.
[14]	李敏, 张森, 曾祥光, 王刚, 张童伟, 谢地杰, 任文哲, 张滔. 基于深度强化学习的四足机器人单腿越障轨迹规划[J]. 系统仿真学报, 2025, 37(4): 895-909.
[15]	贺志刚, 李大焱, 王妮娅, 毛剑琳, 王宁. 一种含链式工作方式的多机器人协作路径规划算法[J]. 系统仿真学报, 2025, 37(4): 953-967.