Control Strategy for UAV Cluster Formation Rendezvous Based on LDE-MADDPG Algorithm

doi:10.16182/j.issn1004731x.joss.24-0333

Abstract

Abstract:

To solve the problem of difficulty in UAV cluster formation rendezvous based on MADDPG algorithm, an autonomous collaborative control strategy based on LDE-MADDPG algorithm is proposed. To address the issues of weak generalization, poor scalability, and slow cluster training process of MADDPG algorithm, LDE-MADDPG algorithm was proposed by designing a state feature learning network and a decoupled Critical network. By integrating LDE-MADDPG algorithm with strategy generation elements such as the decoupled reward function, cluster state space, and UAV action space, a control strategy for UAV cluster formation endezvous that can adapt to diverse formations and varying quantities has been developed. Simulation experiments show that compared to MADDPG algorithm, LDE-MADDPG algorithm improves the training process by 19.6%; The generated control strategy can complete the assembly of six different formations, such as a diamond, within 60 seconds, and achieve the formation and assembly of 6-21 drone clusters within 80 seconds with good generalization and scalability.

Key words: LDE-MADDPG algorithm, state feature learning network, decoupled Critical network model, formation rendezvous

CLC Number:

V279

Xiao Wei, Gao Jiabo, Ke Xueliang. Control Strategy for UAV Cluster Formation Rendezvous Based on LDE-MADDPG Algorithm[J]. Journal of System Simulation, 2025, 37(9): 2335-2351.

Figures/Tables 22

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Table 1

Experiment parameters of reinforcement learning environment for UAV cluster formation assembly task

实验参数	数值
无人机大小/m	$5$
无人机初始速度 $(v x, v y)$ /( $m / s$ )	$(1,0)$
无人机最大速度 $v x m a x 、 v y m a x$ /( $m / s$ )	$5$
无人机最小安全飞行空间距离 $d s a f$ /m	$3$
任务环境大小	$150 m × 80 m$
任务完成规定时间 $t e$ /s	$120$
集结完成时无人机集群与期望集结编队队形允许的最大距离误差 $d e$ /m	$5$

Table 1

Table 2

Training hyperparameters

超参数	数值	意义
MaxEpisode	150	训练局数
MaxStep	300	每局总的时间步数
$β$	0.000 1	Actor网络学习率
$α g$	0.001	全局Critic网络学习率
$α$	0.001	局部Critic网络学习率
$γ$	0.95	累计奖励折扣因子
Buffer-size	500 000	经验池中能够存放的样本容量
Batch-size	128	每次训练网络模型的一小批量样本数
$τ$	0.01	目标网络参数软更新频率
$a 1, a 2, a 3, a 4$	0.3,0.3,0.1,0.3	奖励函数调节系数

Table 2

Fig. 5

Fig. 6

Fig. 7

Table 3

Table 4

Fig. 8

Fig. 9

Fig. 10

Fig. 11

Fig. 12

Table 5

Fig. 13

Fig. 14

Fig. 15

Fig. 16

Table 6

References 15

[1]	任双, 周洁, 高嵩, 等. 基于注意力机制的无人机集群协同分群控制算法[J]. 电子学报, 2023, 51(7): 1898-1905.
	Ren Shuang, Zhou Jie, Gao Song, et al. Cooperative Fission Control Algorithm of UAV Swarm Based on Attention Mechanism[J]. Acta Electronica Sinica, 2023, 51(7): 1898-1905.
[2]	王元鑫, 温家鑫, 袁涛, 等. 无人机集结问题主要性能指标[J]. 兵工自动化, 2020, 39(10): 71-75.
	Wang Yuanxin, Wen Jiaxin, Yuan Tao, et al. Main Performance Index of UAV Aggregation Problem[J]. Ordnance Industry Automation, 2020, 39(10): 71-75.
[3]	孙田野, 孙伟, 吴建军. 改进Quatre算法的无人机编队快速集结方法[J]. 系统工程与电子技术, 2022, 44(9): 2840-2848.
	Sun Tianye, Sun Wei, Wu Jianjun. UAV Formation Rapid Assembly Method Based on Improved Quatre Algorithm[J]. Systems Engineering and Electronics, 2022, 44(9): 2840-2848.
[4]	Kuriki Yasuhiro, Namerikawa Toru. Consensus-based Cooperative Formation Control with Collision Avoidance for a Multi-UAV System[C]//2014 American Control Conference. Piscataway: IEEE, 2014: 2077-2082.
[5]	Reynolds C W. Flocks, Herds and Schools: A Distributed Behavioral Model[C]//Proceedings of the 14th Annual Conference on Computer Graphics and Interactive Techniques. New York: ACM, 1987: 25-34.
[6]	段海滨, 王道波, 于秀芬. 几种新型仿生优化算法的比较研究[J]. 计算机仿真, 2007, 24(3): 169-172, 253.
	Duan Haibin, Wang Daobo, Yu Xiufen. Research on Some Novel Bionic Optimization Algorithms[J]. Computer Simulation, 2007, 24(3): 169-172, 253.
[7]	邱华鑫, 段海滨. 从鸟群群集飞行到无人机自主集群编队[J]. 工程科学学报, 2017, 39(3): 317-322.
	Qiu Huaxin, Duan Haibin. From Collective Flight in Bird Flocks to Unmanned Aerial Vehicle Autonomous Swarm Formation[J]. Chinese Journal of Engineering, 2017, 39(3): 317-322.
[8]	夏家伟, 刘志坤, 朱旭芳, 等. 基于多智能体强化学习的无人艇集群集结方法[J]. 北京航空航天大学学报, 2023, 49(12): 3365-3376.
	Xia Jiawei, Liu Zhikun, Zhu Xufang, et al. A Coordinated Rendezvous Method for Unmanned Surface Vehicle Swarms Based on Multi-agent Reinforcement Learning[J]. Journal of Beijing University of Aeronautics and Astronautics, 2023, 49(12): 3365-3376.
[9]	李波, 越凯强, 甘志刚, 等. 基于MADDPG的多无人机协同任务决策[J]. 宇航学报, 2021, 42(6): 757-765.
	Li Bo, Yue Kaiqiang, Gan Zhigang, et al. Multi-UAV Cooperative Autonomous Navigation Based on Multi-agent Deep Deterministic Policy Gradient[J]. Journal of Astronautics, 2021, 42(6): 757-765.
[10]	赵琳, 吕科, 郭靖, 等. 基于深度强化学习的无人机集群协同作战决策方法[J]. 计算机应用, 2023, 43(11): 3641-3646.
	Zhao Lin, Ke Lü, Guo Jing, et al. UAV Cluster Cooperative Combat Decision-making Method Based on Deep Reinforcement Learning[J]. Journal of Computer Applications, 2023, 43(11): 3641-3646.
[11]	Li Bo, Liang Shiyang, Gan Zhigang, et al. Research on Multi-UAV Task Decision-making Based on Improved MADDPG Algorithm and Transfer Learning[J]. International Journal of Bio-Inspired Computation, 2021, 18(2): 82-91.
[12]	孙彧, 徐越, 潘宣宏, 等. 基于后验经验回放的MADDPG算法[J]. 指挥信息系统与技术, 2021, 12(6): 78-84.
	Sun Yu, Xu Yue, Pan Xuanhong, et al. Multi-agent Deep Deterministic Policy Gradient (MADDPG) Algorithm Based on Hindsight Experience Replay (HER)[J]. Command Information System and Technology, 2021, 12(6): 78-84.
[13]	周佳炜, 孙宇祥, 薛宇凡, 等. 融合先验知识的异构多智能体强化学习算法研究[J]. 指挥控制与仿真, 2023, 45(3): 99-107.
	Zhou Jiawei, Sun Yuxiang, Xue Yufan, et al. Heterogeneous Multi-agent Reinforcement Learning Algorithm Integrating Prior-knowledge[J]. Command Control & Simulation, 2023, 45(3): 99-107.
[14]	符小卫, 徐哲, 王辉. 基于DDPG的无人机追捕任务泛化策略设计[J]. 西北工业大学学报, 2022, 40(1): 47-55.
	Fu Xiaowei, Xu Zhe, Wang Hui. Generalization Strategy Design of Uavs Pursuit Evasion Game Based on DDPG[J]. Journal of Northwestern Polytechnical University, 2022, 40(1): 47-55.
[15]	Gebhardt G H W, Hüttenrauch M, Neumann G. Using M-embeddings to Learn Control Strategies for Robot Swarms[J]. Swarm Intelligence, 2019, 5(8): 22-35.

局数	架次
局数	第1架	第2架	第3架	第4架	第5架
第80局	240	-870	-980	-460	1 550
第140局	15 220	10 110	9 420	10 050	11 980
训练进程	249.7/局	183.0/局	173.3/局	175.2/局	173.8/局
局数	架次				集群
局数	第6架	第7架	第8架	第9架	集群
第80局	-990	240	880	450	6
第140局	-870	10 150	12 320	11 840	10 011
训练进程	2.0/局	165.2/局	190.7/局	189.8/局	166.75/局

局数	架次
局数	第1架	第2架	第3架	第4架	第5架
第80局	1 080	250	5 210	130	9 450
第140局	16 210	12 870	15 890	10 900	18 530
训练进程	252.2/局	210.3/局	178.0/局	179.5/局	153/局
局数	架次				集群
局数	第6架	第7架	第8架	第9架	集群
第80局	280	-980	4 460	7 840	3 080
第140局	16 430	10 880	16 090	17 650	15 050
训练进程	269.2/局	197.7/局	193.8/局	163.5/局	199.5/局

编队队形	距离/m					集结完成时间/s
编队队形	20 s	40 s	60 s	80 s	100 s	集结完成时间/s
10架次三角形	18.458	1.570	0.625	0.625	0.625	60
12架次正方形	17.836	5.753	0.524	0.521	0.521	60
9架次十字形	24.851	3.021	0.259	0.259	0.259	60
8架次心形	16.744	1.189	0.218	0.218	0.218	40
9架次菱形	22.964	1.655	0.133	0.133	0.133	60

架次	距离/m					集结完成时间/s
架次	20 s	40 s	60 s	80 s	100 s	集结完成时间/s
6	26.308	1.080	0.208	0.208	0.208	40
10	31.796	3.281	0.873	0.793	0.727	60
18	47.711	13.990	2.167	0.294	0.294	80
21	47.626	17.839	4.018	0.123	0.123	80