基于随机策略搜索的多机三维路径规划方法

doi:10.16182/j.issn1004731x.joss.21-0112

摘要/Abstract

摘要：

针对传统无能耗约束的多无人机路径规划方法难以适应复杂山地作业环境的应急救援要求，提出了一种基于LSTM-DPPO(long short-term memory-distributed proximal policy optimization)框架的多无人机三维路径规划算法。利用LSTM长短期记忆神经网络提取出多无人机在各自飞行过程中的重要特征状态信息序列，经过多次迭代更新后得到一个最优网络参数模型，结合能耗生成最优的三维探测路径。实验结果表明：该方法相对于传统路径规划方法效果显著，能在能耗最小的前提下规划出最优探测路径。

关键词: 多无人机, 深度强化学习算法, 神经网络, 三维路径规划, 能耗

Abstract:

In view of the difficulty of the traditional path planning method without energy consumption constraints to meet the emergency rescue requirements in the complex mountain operation environment, a three-dimensional path planning algorithm for multi-UAVs is proposed based on LSTM-DPPO(long short-term memory-distributed proximal policy optimization) framework. The LSTM long and short-term memory neural network is used to extract the important characteristic state information sequence of the multiple unmanned aerial vehicles in their respective flight process. After repeated iteration and updating, an optimal network parameter model is obtained. Combined with the energy consumption, the optimal 3D detection path is generated. Simulation experiments verify that the proposed method is more effective than the traditional path planning method and can plan the optimal detection path with the minimum energy consumption.

Key words: multi-UAVs, deep reinforcement learning algorithms, neural network, 3D path planning, energy consumption

中图分类号:

TP183

张森, 张孟炎, 邵敬平, 普杰信. 基于随机策略搜索的多机三维路径规划方法[J]. 系统仿真学报, 2022, 34(6): 1286-1295.

Sen Zhang, Mengyan Zhang, Jingping Shao, Jiexin Pu. Multi-UAVs 3D Path Planning Method Based on Random Strategy Search[J]. Journal of System Simulation, 2022, 34(6): 1286-1295.

图/表 16

图1

图2

图3

图4

图5

图6

图7

图8

表1

有能耗约束与无能耗约束下的指标对比

指标		第300次	第1 000次
所有无人机总路径/km	无能耗约束	76.59	64.12
所有无人机总路径/km	有能耗约束	65.23	59.14
已检测目标区域量	无能耗约束	8	10
已检测目标区域量	有能耗约束	7	10
所有无人机总能耗× $10 - 3$ /kJ	无能耗约束	42.486	36.128
所有无人机总能耗× $10 - 3$ /kJ	有能耗约束	35.000	27.416

表1

图9

图10

图11

表2

LSTM-DPPO和A*算法的局部数据对比

算法	路径/km	飞行时长/min	能耗 $× 10 - 3$ /kJ
LSTM-DPPO	5.613	11.25	5.283
A*	5.552	11.38	6.192

表2

图12

图13

表3

多种算法下的指标对比

算法	所有无人机总路径/km	已检测目标区域量	所有飞行时间总长/min	所有无人机总能耗 $× 10 - 3$ /kJ
LSTM-DPPO	59.14	10	18.75	27.416
A*	58.12	10	19.22	33.286
蚁群	63.85	10	21.27	35.822
Dijkstra	64.67	10	21.87	32.468

表3

参考文献 26

1	Mohammadreza Radmanesh, Manish Kumar, Sarim Moha Mmad. Grey Wolf Optimization Based Sense and Avoid Algorithm in a Bayesian Framework for Multi-UAVs Path Planning in an Uncertain Environment[J]. Aerospace Science and Technology(S1270-9638), 2018, 77: 168-179.
2	阚平, 姜兆亮, 刘玉浩, 等. 多植保无人机协同路径规划[J]. 航空学报, 2020, 41(4): 255-265.
	Kan Ping, Jiang Zhaoliang, Liu Yuhao, et al. Collaborative Path Planning for Multi-Plant Protection UAV[J]. Journal of Aviation, 2020, 41(4): 255-265.
3	戴健, 许菲, 陈琪锋. 多无人机协同搜索区域划分与路径规划[J]. 航空学报, 2020, 41(增1): 146-153.
	Dai Jian, Xu Fei, Chen Qifeng. Multi-UAVs Cooperative Search Area Division and Path Planning[J]. Journal of Aviation, 2020, 41(S1): 146-153.
4	Yoon Y, Kim M, Kim Y. Three-Dimensional Path Planning for Aerial Refueling Between One Tanker and Multi-UAVs[J]. International Journal of Aeronautical and Space(S2093-274X), 2018, 19(4): 1027-1040.
5	Yang X, Zhou W, Zhang Y. On Collaborative Path Planning for Multi-UAVs Based on Pythagorean Hodograph curve[C]// Guidance, Navigation & Control Conference. Nanjing: IEEE, 2016: 12-14.
6	黄东晋, 蒋晨凤, 韩凯丽. 基于深度强化学习的三维路径规划算法[J]. 计算机工程与应用, 2020, 56(15): 30-36.
	Huang Dongjin, Jiang Chenfeng, Han Kaili. 3D Path Planning Algorithm Based on Deep Reinforcement Learning[J]. Computer Engineering and Applications, 2020, 56(15): 30-36.
7	Sun P, Shan R. Predictive Control with Velocity Observer for Cushion Robot Based on PSO for Path Planning[J]. Journal of Systems Science & Complexity(S1009-6124), 2020, 33(4): 988-1011.
8	Johnathan Votion, Cao Yongcan. Diversity-Based Cooperative Multivehicle Path Planning for Risk Management in Costmap Environments[J]. IEEE Transactions on Industrial Electronics(S0278-0046), 2019, 66(8): 6117-6127.
9	Yu W, Low Kin Huat, Chen Lü. Cooperative Path Planning for Heterogeneous Unmanned Vehicles in a Search-and-Track Mission Aiming at an Underwater Target[J]. IEEE Transactions on Vehicular Technology(S0018-9545), 2020, 69(6): 6782-6787.
10	Chnjiang W, Shijie Z, Licai X. Dynamic Path Planning Based on Improved Ant Colony Algorithm in Traffic Congestion[J]. IEEE Access(S2169-3536), 2020, 8: 180773-180783.
11	Yi Jun, Bai Junren, He Haibo. A Multifactorial Evolutionary Algorithm for Multitasking Under Interval Uncertainties[J]. IEEE Transactions on Evolutionary Computation(S1089-778X), 2020, 24(5): 908-922.
12	陈海, 何开锋, 钱炜祺. 多无人机协同覆盖路径规划[J]. 航空学报, 2016, 37(3): 928-935.
	Chen Hai, He Kaifeng, Qian Weiqi. Multi-UAV Collaborative Coverage Path Planning [J]. Journal of Aviation, 2016, 37(3): 928-935.
13	Yao X, Wang X, Zhang L. Model Predictive and Adaptive Neural Sliding Mode Control for 3D Path Following of Autonomous Underwater Vehicle with Input Saturation[J]. Neural Computing and Applications (S0941-0643), 2020, 32(22): 16875-16889.
14	Delin G, Lan T, Xinggan Z. Joint Optimization of Handover Control and Power Allocation Based on Multi-Agent Deep Reinforcement Learning[J]. IEEE Transactions on Vehicular Technology(S0018-9545), 2020, 69(11): ‏13124-13138.
15	Jonggyu J, Hyun Jong Y. Deep Reinforcement Learning-Based Resource Allocation and Power Control in Small Cells with Limited Information Exchange[J]. IEEE Transactions on Vehicular Technology(S0018-9545), 2020, 69(11): 13768-13783.
16	Fengxiao T, Yibo Z, Neia Kato. Deep Reinforcement Learning for Dynamic Uplink/DowNlink Resource Allocation in High Mobility 5G HetNet [J]. IEEE Journal on Selected Areas in Communications(S0733-8716),2020, 38(12): ‏2773-2782.
17	Guan Y, Yangang R, Shengbo Eben L. Centralized Cooperation for Connected and Automated Vehicles at Intersections by Proximal Policy Optimization [J]. IEEE Transactions on Vehicular Technology(S0018-9545), 2020, 69(11): 12597-12608.
18	Wenhan Z, Chunbo L, Jin W. Deep-Reinfo Rcement-Learning-Based Offloading Scheduling for Vehicular Edge Computing[J]. IEEE Internet of Things Journal(S2327-4662), 2020, 7(6): 5449-5465.
19	Guangda C, Shunyi Y, Jun M. Distributed Non-Communicating Multi-Robot Collision Avoidance Via Map-Based Deep Reinforcement Learning[J]. International Journal of Electrical Power & Energy Systems(S0142-0615), 2020, 20(17): 4836.
20	Da L, Zhaosheng Z, Peng L. Battery Fault Diagnosis for Electric Vehicles Based on Voltage Abnormality by Combining the Long Short-Term Memory Neural Network and the Equivalent Circuit Model[J]. IEEE Transactions on Power Electronics(S0885-8993), 2020, 36(12):‏ 1303-131.
21	Tao L, Yongjin H, Ankang J. Adversarial Active Learning for Named Entity Recognition in Cybersecurity[J]. CMC-Computers Materials & Continua(S1546-2218), 2020, 66(1): 407-420.
22	Yang Yuan L, Do Tien Van, Nguyen Hai T. A Comparison of Forecasting Models for the Resource Usage of MapReduce Applications[J]. Neurocomputing(S0925-2312), 2020, 418: 36-55.
23	Ziming Y, Yan X. A Multi-Agent Deep Reinforcement Learning Method for Cooperative Load Frequency Control of a Multi-Area Power System[J]. IEEE Transactions on Power Systems(S0885-8950), 2020, 35(6): ‏4180-4192.
24	Fouad Y, Nassim R, Laid D, et al.Trajectory Optimisation for a Quadrotor Helicopter Considering Energy Consumption[C]// 2017 4th International Conference on Control, Decision and Information Technologies. Barcelona, Spain: IEEE, 2017: 5-7.
25	Goeke D, Schneider M. Routing a Mixed Fleet of Electric and Conventional Vehicles[J]. European Journal of Operational Research(S0377-2217), 2015, 245(1): 81-99.
26	孙长银, 穆朝絮. 多智能体深度强化学习的若干关键科学问题[J]. 自动化学报, 2020, 46(7): 1301-1312.
	Sun Changyin, Mu Zhaoxu. Some Key Scientific Problems of Deep Reinforcement Learning for Multi-Agent[J]. Acta Automatica Sinica, 2020, 46(7): 1301-1312.

[1]	张洪亮, 丁仁曼, 徐公杰. 考虑区间工时的多目标柔性作业车间节能调度[J]. 系统仿真学报, 2022, 34(9): 1976-1987.
[2]	郭业才, 王庆伟. 基于截断迁移与并行残差网络的调制识别算法[J]. 系统仿真学报, 2022, 34(9): 2009-2018.
[3]	仝卫国, 曾世超, 张立峰, 侯哲, 郭佳跃. 基于深度残差神经网络的电阻层析成像及流型辨识方法[J]. 系统仿真学报, 2022, 34(9): 2028-2036.
[4]	张立峰, 苗雨. 一种声学层析成像温度分布高分辨率重建方法[J]. 系统仿真学报, 2022, 34(9): 2065-2073.
[5]	张会林, 金玉洁, 杨海马. ANFIS优化磁链滑模观测器的PMSM无传感器控制[J]. 系统仿真学报, 2022, 34(8): 1682-1690.
[6]	刘兴华, 耿晨, 谢胜寒, 田佳强, 曹晖. 考虑光伏发电不确定性的日前火电-光伏经济调度[J]. 系统仿真学报, 2022, 34(8): 1874-1884.
[7]	张晓青, 肖万芳, 郭英杰, 刘博文, 韩学森, 马经纬, 高高, 黄赫, 夏时洪. 融合LSTM和MoE的倒闸操作识别[J]. 系统仿真学报, 2022, 34(8): 1899-1907.
[8]	张立峰, 王会忍. 基于卷积神经网络及有限元仿真的电容层析成像图像重建[J]. 系统仿真学报, 2022, 34(4): 712-718.
[9]	龙艳琴, 乔贵方, 宋光明, 张颖, 程琳琳. 二维变刚度蛇形机器人的运动仿真与性能分析[J]. 系统仿真学报, 2022, 34(4): 759-767.
[10]	康旭, 张晓峰. 基于生成对抗神经网络的雷达遥感数据增广方法[J]. 系统仿真学报, 2022, 34(4): 920-927.
[11]	周思锦, 陈棣成, 涂耿, 姜大志. 基于个性化和记忆机制的多模态情感计算模型[J]. 系统仿真学报, 2022, 34(4): 745-758.
[12]	孙晓安, 栾小丽, 刘飞. 基于智能优化灰色模型的电子固废预测[J]. 系统仿真学报, 2022, 34(3): 536-542.
[13]	魏娟, 游磊, 郭阳勇, 唐志海. 基于小波神经网络的多楼层疏散模型[J]. 系统仿真学报, 2022, 34(2): 269-277.
[14]	敖邦乾, 杨莎, 令狐金卿, 叶振环. 基于级联神经网络疲劳驾驶检测系统设计[J]. 系统仿真学报, 2022, 34(2): 323-333.
[15]	白燕, 武璐璐, 贺引娥, 王玉英. 基于动态温度调控的空调系统能耗预测[J]. 系统仿真学报, 2022, 34(2): 366-375.