系统仿真学报 ›› 2025, Vol. 37 ›› Issue (9): 2335-2351.doi: 10.16182/j.issn1004731x.joss.24-0333

• 论文 • 上一篇    

基于LDE-MADDPG算法的无人机集群编队集结控制策略

肖玮1, 高甲博1,2, 柯学良1   

  1. 1.中国人民解放军联勤保障部队工程大学,重庆 401311
    2.中国人民解放军95019部队
  • 收稿日期:2024-04-02 修回日期:2024-05-27 出版日期:2025-09-18 发布日期:2025-09-22
  • 第一作者简介:肖玮(1982-),女,副教授,硕士生导师,博士,研究方向为无人智能化、物联网工程与系统。
  • 基金资助:
    重庆市教委科学技术研究项目(KJZD-K202312903);重庆市研究生科研创新项目(CYS23778);陆军勤务学院科研项目(LQ-ZD-202316);陆军勤务学院研究生科研创新项目(LQ-ZD-202209)

Control Strategy for UAV Cluster Formation Rendezvous Based on LDE-MADDPG Algorithm

Xiao Wei1, Gao Jiabo1,2, Ke Xueliang1   

  1. 1.Joint Logistic Support Force Engineering University of PLA, Chongqing 401311, China
    2.PLA 95019 Troops
  • Received:2024-04-02 Revised:2024-05-27 Online:2025-09-18 Published:2025-09-22

摘要:

针对MADDPG算法用于无人机集群编队集结控制的局限性,提出基于LDE-MADDPG算法的无人机集群编队集结控制策略。通过设计状态特征学习网络和解耦式Critic网络提出LDE-MADDPG算法,用以改善MADDPG算法的泛化性、可扩展性及集群训练效率。将该算法结合构建的解耦式奖励函数、集群状态空间和无人机动作空间等要素,生成了能够适应不同队形和不同数量的无人机集群编队集结策略。仿真实验表明:较MADDPG算法,LDE-MADDPG算法提升了19.6%的训练效率;生成的集群编队集结控制策略能够在60 s内完成包括“菱形”在内的6种无人机队形集结,80 s内实现从6~21架次的无人机集群编队集结,表现出了良好的泛化性和可扩展性。

关键词: LDE-MADDPG算法, 状态特征学习网络, 解耦式Critic网络, 编队集结

Abstract:

To solve the problem of difficulty in UAV cluster formation rendezvous based on MADDPG algorithm, an autonomous collaborative control strategy based on LDE-MADDPG algorithm is proposed. To address the issues of weak generalization, poor scalability, and slow cluster training process of MADDPG algorithm, LDE-MADDPG algorithm was proposed by designing a state feature learning network and a decoupled Critical network. By integrating LDE-MADDPG algorithm with strategy generation elements such as the decoupled reward function, cluster state space, and UAV action space, a control strategy for UAV cluster formation endezvous that can adapt to diverse formations and varying quantities has been developed. Simulation experiments show that compared to MADDPG algorithm, LDE-MADDPG algorithm improves the training process by 19.6%; The generated control strategy can complete the assembly of six different formations, such as a diamond, within 60 seconds, and achieve the formation and assembly of 6-21 drone clusters within 80 seconds with good generalization and scalability.

Key words: LDE-MADDPG algorithm, state feature learning network, decoupled Critical network model, formation rendezvous

中图分类号: