系统仿真学报 ›› 2025, Vol. 37 ›› Issue (5): 1169-1187.doi: 10.16182/j.issn1004731x.joss.24-0025

• 研究论文 • 上一篇    下一篇

基于深度强化学习的四旋翼航迹跟踪控制方法

伍国华1, 曾家恒2, 王得志3, 郑龙4, 邹伟5   

  1. 1.中南大学 自动化学院,湖南 长沙 410083
    2.中南大学 交通运输工程学院,湖南 长沙 410083
    3.国防科技大学 气象海洋学院,湖南 长沙 410015
    4.国防科技大学 军事职业教育技术服务中心,湖南 长沙 410015
    5.中南大学 计算机学院,湖南 长沙 410083
  • 收稿日期:2024-01-08 修回日期:2024-03-12 出版日期:2025-05-20 发布日期:2025-05-23
  • 通讯作者: 王得志
  • 第一作者简介:伍国华(1986-),男,教授,博士,研究方向为无人机系统与强化学习算法设计。
  • 基金资助:
    国家自然科学基金(62373380)

A Quadrotor Trajectory Tracking Control Method Based on Deep Reinforcement Learning

Wu Guohua1, Zeng Jiaheng2, Wang Dezhi3, Zheng Long4, Zou Wei5   

  1. 1.School of Automation Central South University, Changsha 410083, China
    2.School of Traffic and Transportation Engineering, Central South University, Changsha 410083, China
    3.School of Meteorology and Oceanography, National University of Defense Technology, Changsha 410015, China
    4.Military Vocational Education Technology Service Center, National University of Defense Technology, Changsha 410015, China
    5.School of Computer Science and Engineering, Central South University, Changsha 410083, China
  • Received:2024-01-08 Revised:2024-03-12 Online:2025-05-20 Published:2025-05-23
  • Contact: Wang Dezhi

摘要:

受限于模型方程决定的固定结构,传统四旋翼控制器设计难以有效应对模型参数和环境扰动变化带来的控制误差。提出了基于深度强化学习的四旋翼航迹跟踪控制方法,构建了对应的马尔可夫决策模型,并基于PPO框架提出了PPO-SAG(PPO with self adaptive guide)算法。PPO-SAG在学习过程中加入自适应机制,利用PID专家知识进行引导和学习,提高了训练的收敛效果和稳定性。根据问题特点,设计了带有距离约束惩罚和熵策略的目标函数,提出扰动误差信息补充结构和航迹特征选择结构,补充控制误差信息、提取未来航迹关键要素,提高了收敛效果。并利用状态动态标准化、优势函数批标准化及奖励缩放策略,更合理地处理三维空间中的状态表征和奖励优势表达。单种航迹与混合航迹实验表明,所提出的PPO-SAG算法在收敛效果和稳定性上均取得了最好的效果,消融实验说明所提出的改进机制和结构均起到正向作用。所研究的未知扰动下基于深度强化学习的四旋翼航迹跟踪控制问题,为设计更加鲁棒高效的四旋翼控制器提供了解决方案。

关键词: 深度强化学习, 四旋翼航迹跟踪控制, 近端策略优化(PPO), 自适应机制, 注意力机制

Abstract:

Traditional quadrotor controllers, constrained by fixed model equation structures, encounter challenges in addressing control errors stemming from variations in parameters and environmental disturbances. This paper proposes a deep reinforcement learning solution for the quadrotor trajectory following control problem. We present the PPO-SAG algorithm incorporated into the PPO framework, utilizing adaptive mechanisms and PID expert knowledge to enhance training convergence and stability. Target functions incorporating distance constraint penalties and entropy policies are designed in alignment with the characteristics of the given problem.We also devise innovative disturbance-adaptive structures and trajectory feature selection mechanisms to augment control error information and extract crucial elements from future trajectories, thereby enhancing convergence. Experiments on single and mixed trajectories indicate that the PPO-SAG algorithm achieves superior performance in both convergence and stability. Verification experiments confirm positive effects of proposed improvements. The trajectory tracking control problem of quadrotors based on deep reinforcement learning under unknown disturbances studied in this paper provides a solution for designing more robust and efficient quadrotor controllers.

Key words: deep reinforcement learning, track following control, proximal policy optimization(PPO), adaptive mechanism, attention mechanism

中图分类号: