A Quadrotor Trajectory Tracking Control Method Based on Deep Reinforcement Learning

doi:10.16182/j.issn1004731x.joss.24-0025

Abstract

Abstract:

Traditional quadrotor controllers, constrained by fixed model equation structures, encounter challenges in addressing control errors stemming from variations in parameters and environmental disturbances. This paper proposes a deep reinforcement learning solution for the quadrotor trajectory following control problem. We present the PPO-SAG algorithm incorporated into the PPO framework, utilizing adaptive mechanisms and PID expert knowledge to enhance training convergence and stability. Target functions incorporating distance constraint penalties and entropy policies are designed in alignment with the characteristics of the given problem.We also devise innovative disturbance-adaptive structures and trajectory feature selection mechanisms to augment control error information and extract crucial elements from future trajectories, thereby enhancing convergence. Experiments on single and mixed trajectories indicate that the PPO-SAG algorithm achieves superior performance in both convergence and stability. Verification experiments confirm positive effects of proposed improvements. The trajectory tracking control problem of quadrotors based on deep reinforcement learning under unknown disturbances studied in this paper provides a solution for designing more robust and efficient quadrotor controllers.

Key words: deep reinforcement learning, track following control, proximal policy optimization(PPO), adaptive mechanism, attention mechanism

CLC Number:

TP273

Wu Guohua, Zeng Jiaheng, Wang Dezhi, Zheng Long, Zou Wei. A Quadrotor Trajectory Tracking Control Method Based on Deep Reinforcement Learning[J]. Journal of System Simulation, 2025, 37(5): 1169-1187.

Figures/Tables 24

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Table 1

Crazyflie parameters and disturbance range

Crazyflie参数	标称值	扰动
四旋翼质量 $m / k g$	$0.03$	$± 15 %$
转动惯量 $J / (k g m 2)$	$[1.43 × 10 - 5,$ $1.43 × 10 - 5, 2.89 × 10 - 5]$	$± 15 %$
重力加速度 $g / (m / s 2)$	$[0,0, - 9.81]$	$± 0.1$
$x$ 轴空气阻力 $k v x / (N s / m)$	$0.1$	$± 0.05$
$y$ 轴空气阻力 $k v y / (N s / m)$	$0.1$	$± 0.05$
$z$ 轴空气阻力 $k v z / (N s / m)$	$0.05$	$± 0.03$
四旋翼螺旋桨臂长 $d i / m$	$0.046$	$0$
最大转速 $Ω m a x / (r a d / s)$	$2 500$	$0$
推力系数 $c l / (N / (r a d ⋅ s))$	$2.3 × 10 - 8$	$0$
转动矩 $c d / (N ⋅ m / (r a d ⋅ s))$	$7.8 × 10 - 11$	$0$

Table 1

Table 2

PID control algorithm parameter settings

PID控制参数名称	参数设置
位置误差系数 $K p$	$d i a g (22,22,15)$
速度误差系数 $K v$	$d i a g (7.9,7.9,6.45)$
姿态误差系数 $K R$	$d i a g (3 500,3 500,400)$
角速度误差系数 $K ω$	$d i a g (107.75,107.75,43.75)$

Table 2

Table 3

Parameter settings for PPO-SAG algorithm

参数名称	参数设置	参数名称	参数设置
折扣因子 $γ$	$0.99$	最大步长	$1 × 107$
学习率 $L r$	$≤ 3 × 10 - 4$	批训练大小	$64$
距离系数 $β$	$5.0$	回合步长	$1 800$
裁剪率 $ε$	$0.2$	经验池大小	$1 024$
航迹数	$50$	退火轮次 $L$	$100$
模拟退火系数	$0.95$	初始温度	$1 000$

Table 3

Fig. 5

Fig. 6

Fig. 7

Table 4

Fig. 8

Fig. 9

Fig. 10

Fig. 11

Fig. 12

Fig. 13

Fig. 14

Table 5

Large disturbance range setting

参数名称	新扰动范围	参数名称	新扰动范围
四旋翼质量	$± 20 %$	$x$ 轴空气阻力	$± 0.08$
转动惯量	$± 20 %$	$y$ 轴空气阻力	$± 0.08$
重力加速度	$± 0.12$	$z$ 轴空气阻力	$± 0.05$

Table 5

Table 6

Fig. 15

Fig. 16

Fig. 17

Fig. 18

References 46

1	伍国华, 毛妮, 徐彬杰, 等. 基于自适应大规模邻域搜索算法的多车辆与多无人机协同配送方法[J]. 控制与决策, 2023, 38(1): 201-210.
	Wu Guohua, Mao Ni, Xu Binjie, et al. The Cooperative Delivery of Multiple Vehicles and Multiple Drones Based on Adaptive Large Neighborhood Search[J]. Control and Decision, 2023, 38(1): 201-210.
2	AlMahamid F, Grolinger K. Autonomous Unmanned Aerial Vehicle Navigation Using Reinforcement Learning: A Systematic Review[J]. Engineering Applications of Artificial Intelligence, 2022, 115: 105321.
3	Xue Wentao, Wu Hangxing, Ye Hui, et al. An Improved Proximal Policy Optimization Method for Low-level Control of a Quadrotor[J]. Actuators, 2022, 11(4): 105.
4	Lee T, Leok M, McClamroch N H. Geometric Tracking Control of a Quadrotor UAV on SE(3)[C]//49th IEEE conference on decision and control (CDC). Piscataway: IEEE, 2010: 5420-5425.
5	Kamel Mina, Burri Michael, Siegwart Roland. Linear vs Nonlinear MPC for Trajectory Tracking Applied to Rotary Wing Micro Aerial Vehicles[J]. IFAC-PapersOnLine, 2017, 50(1): 3463-3469.
6	Pi Chenhuan, Ye Weiyuan, Cheng S. Robust Quadrotor Control Through Reinforcement Learning with Disturbance Compensation[J]. Applied Sciences, 2021, 11(7): 3257.
7	Lambert N O, Drew D S, Yaconelli J, et al. Low-level Control of a Quadrotor with Deep Model-based Reinforcement Learning[J]. IEEE Robotics and Automation Letters, 2019, 4(4): 4224-4230.
8	董豪, 杨静, 李少波, 等. 基于深度强化学习的机器人运动控制研究进展[J]. 控制与决策, 2022, 37(2): 278-292.
	Dong Hao, Yang Jing, Li Shaobo, et al. Research Progress of Robot Motion Control Based on Deep Reinforcement Learning[J]. Control and Decision, 2022, 37(2): 278-292.
9	Hwangbo Jemin, Sa Inkyu, Siegwart Roland, et al. Control of a Quadrotor with Reinforcement Learning[J]. IEEE Robotics and Automation Letters, 2017, 2(4): 2096-2103.
10	Koch W, Mancuso R, West R, et al. Reinforcement Learning for UAV Attitude Control[J]. ACM Transactions on Cyber-Physical Systems, 2019, 3(2): 22.
11	Koch W, Mancuso R, Bestavros A. Neuroflight: Next Generation Flight Control Firmware[EB/OL]. (2019-09-16) [2022-10-06]. .
12	Guilherme Cano Lopes, Ferreira Murillo, Alexandre da Silva Simões, et al. Intelligent Control of a Quadrotor with Proximal Policy Optimization Reinforcement Learning[C]//2018 Latin American Robotic Symposium, 2018 Brazilian Symposium on Robotics (SBR) and 2018 Workshop on Robotics in Education (WRE). Piscataway: IEEE, 2018: 503-508.
13	Shehab Mazen, Zaghloul Ahmed, El-Badawy Ayman. Low-level Control of a Quadrotor Using Twin Delayed Deep Deterministic Policy Gradient (TD3)[C]//2021 18th International Conference on Electrical Engineering, Computing Science and Automatic Control (CCE). Piscataway: IEEE, 2021: 1-6.
14	Gabriel Moraes Barros, Esther Luna Colombini. Using Soft Actor-critic for Low-level UAV Control[EB/OL]. (2020-10-05) [2023-10-06]. .
15	Wang Yuanda, Sun Jia, He Haibo, et al. Deterministic Policy Gradient with Integral Compensator for Robust Quadrotor Control[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2020, 50(10): 3713-3725.
16	Barzegar Ali, Jin Lee Deok. Deep Reinforcement Learning-based Adaptive Controller for Trajectory Tracking and Altitude Control of an Aerial Robot[J]. Applied Sciences, 2022, 12(9): 4764.
17	梁吉, 王立松, 黄昱洲, 等. 基于深度强化学习的四旋翼无人机自主控制方法[J]. 计算机科学, 2023, 50(增2): 1-7.
	Liang Ji, Wang Lisong, Huang Yuzhou, et al. Autonomous Control Algorithm for Quadrotor Based on Deep Reinforcement Learning[J]. Computer Science, 2023, 50(S2): 1-7.
18	王伟, 吴昊, 刘鸿勋, 等. 基于深度强化学习的无人机姿态控制器设计[J]. 科学技术与工程, 2023, 23(34): 14888-14895.
	Wang Wei, Wu Hao, Liu Hongxun, et al. An Attitude Controller for Quadrotor Drone Using RM-DDPG[J]. Science Technology and Engineering, 2023, 23(34): 14888-14895.
19	孙丹, 高东, 郑建华, 等. 引入积分补偿的四旋翼确定性策略梯度控制器[J]. 计算机工程与设计, 2023, 44(1): 255-261.
	Sun Dan, Gao Dong, Zheng Jianhua, et al. Deterministic Policy Gradient Controller with integral compensator for quadrotor[J]. Computer Engineering and Design, 2023, 44(1): 255-261.
20	杨志鹏, 李波, 甘志刚, 等. 基于深度强化学习的四旋翼无人机航线跟随[J]. 指挥与控制学报, 2022, 8(4): 477-482.
	Yang Zhipeng, Li Bo, Gan Zhigang, et al. Route Following of Quadrotor UAV Based on Deep Reinforcement Learning[J]. Journal of Command and Control, 2022, 8(4): 477-482.
21	孙丹, 高东, 郑建华, 等. 示教知识辅助的无人机强化学习控制算法[J]. 北京航空航天大学学报, 2023, 49(6): 1424-1433.
	Sun Dan, Gao Dong, Zheng Jianhua, et al. UAV Reinforcement Learning Control Algorithm with Demonstrations[J]. Journal of Beijing University of Aeronautics and Astronautics, 2023, 49(6): 1424-1433.
22	刘安林, 时正华. 基于DDPG策略的四旋翼飞行器目标高度控制[J]. 陕西科技大学学报, 2021, 39(6): 141-147.
	Liu Anlin, Shi Zhenghua. Desired Height Control of Quadrotor Vehicle Based on DDPG Strategy[J]. Journal of Shaanxi University of Science & Technology, 2021, 39(6): 141-147.
23	Molchanov A, Chen Tao, Hönig Wolfgang, et al. Sim-to-(multi)-real: Transfer of Low-level Robust Control Policies to Multiple Quadrotors[C]//2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Piscataway: IEEE, 2019: 59-66.
24	Kaufmann Elia, Bauersfeld Leonard, Scaramuzza Davide. A Benchmark Comparison of Learned Control Policies for Agile Quadrotor Flight[C]//2022 International Conference on Robotics and Automation (ICRA). Piscataway: IEEE, 2022: 10504-10510.
25	Song Yunlong, Steinweg Mats, Kaufmann Elia, et al. Autonomous Drone Racing with Deep Reinforcement Learning[C]//2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Piscataway: IEEE, 2021: 1205-1212.
26	Penicka Robert, Song Yunlong, Kaufmann Elia, et al. Learning Minimum-time Flight in Cluttered Environments[J]. IEEE Robotics and Automation Letters, 2022, 7(3): 7209-7216.
27	Wu Guohua, Mao Ni, Luo Qizhang, et al. Collaborative Truck-drone Routing for Contactless Parcel Delivery During the Epidemic[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(12): 25077-25091.
28	Xu Binjie, Zhao Kexin, Luo Qizhang, et al. A GV-drone Arc Routing Approach for Urban Traffic Patrol by Coordinating a Ground Vehicle and Multiple Drones[J]. Swarm and Evolutionary Computation, 2023, 77: 101246.
29	Faessler Matthias, Franchi Antonio, Scaramuzza Davide. Differential Flatness of Quadrotor Dynamics Subject to Rotor Drag for Accurate Tracking of High-speed Trajectories[J]. IEEE Robotics and Automation Letters, 2018, 3(2): 620-626.
30	Hart P E, Nilsson N J, Raphael B. A Formal Basis for the Heuristic Determination of Minimum Cost Paths[J]. IEEE Transactions on Systems Science and Cybernetics, 1968, 4(2): 100-107.
31	Mellinger D, Kumar V. Minimum Snap Trajectory Generation and Control for Quadrotors[C]//2011 IEEE international conference on robotics and automation. Piscataway: IEEE, 2011: 2520-2525.
32	Kirkpatrick S, Gelatt C D Jr, Vecchi M P. Optimization by Simulated Annealing[J]. Science, 1983, 220(4598): 671-680.
33	Schulman J, Levine S, Moritz P, et al. Trust Region Policy Optimization[C]//Proceedings of the 32nd International Conference on International Conference on Machine Learning. Chia Laguna Resort: PMLR, 2015: 1889-1897.
34	Williams R J. Simple Statistical Gradient-following Algorithms for Connectionist Reinforcement Learning[J]. Machine Learning, 1992, 8(3): 229-256.
35	Schulman J, Moritz P, Levine S, et al. High-dimensional Continuous Control Using Generalized Advantage Estimation[EB/OL]. (2018-10-20) [2023-10-06]. .
36	Schulman J, Wolski F, Dhariwal P, et al. Proximal Policy Optimization Algorithms[EB/OL]. (2017-08-28) [2023-10-06]. .
37	Ilyas A, Engstrom L, Santurkar S, et al. Are Deep Policy Gradient Algorithms Truly Policy Gradient Algorithms?[EB/OL]. (2020-05-25) [2023-10-06]. .
38	Chu Xiangxiang. Policy Optimization with Penalized Point Probability Distance: An Alternative to Proximal Policy Optimization[EB/OL]. (2019-02-14) [2023-10-06]. .
39	Haarnoja T, Tang Haoran, Abbeel P, et al. Reinforcement Learning with Deep Energy-based Policies[C]//Proceedings of the 34th International Conference on Machine Learning. Chia Laguna Resort: PMLR, 2017: 1352-1361.
40	Tucker G, Bhupatiraju S, Gu Shixiang, et al. The Mirage of Action-dependent Baselines in Reinforcement Learning[C]//Proceedings of the 35th International Conference on Machine Learning. Chia Laguna Resort: PMLR, 2018: 5015-5024.
41	Engstrom L, Ilyas A, Santurkar S, et al. Implementation Matters in Deep Policy Gradients: A Case Study on PPO and TRPO[EB/OL]. (2020-05-25) [2023-10-06]. .
42	Vaswani A, Shazeer N, Parmar N, et al. Attention Is All You Need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 6000-6010.
43	Kingma D P, Ba J. Adam: A Method for Stochastic Optimization[EB/OL]. (2017-01-30) [2023-10-06]. .
44	Rohmer Eric, Singh S P N, Freese Marc. V-REP: A Versatile and Scalable Robot Simulation Framework[C]//2013 IEEE/RSJ international conference on intelligent robots and systems. Piscataway: IEEE, 2013: 1321-1326.
45	James S, Freese M, Davison A J. PyRep: Bringing V-REP to Deep Robot Learning[EB/OL]. (2019-06-26) [2023-10-06]. .
46	Förster Julian. System Identification of the Crazyflie 2.0 Nano Quadrocopter[D]. Zurich: ETH Zurich, 2015.

航迹	SAC(S)	PPO(S)	PPO-PPD(S)	PPO-SAG(P)	PPO-SAG(S)	PPO-SAG (FC-M)	PPO-SAG(M)	PID
1	0.196 ± 0.095	0.072 ± 0.036	0.061 ± 0.061	1.055 ± 0.430	0.054 ± 0.037	0.092 ± 0.064	0.087 ± 0.052	0.248 ± 0.073
2	0.473 ± 0.198	0.048 ± 0.032	0.038 ± 0.032	1.024 ± 0.459	0.030 ± 0.021	0.087 ± 0.074	0.075 ± 0.048	0.227 ± 0.078
3	0.404 ± 0.128	0.092 ± 0.208	0.064 ± 0.084	1.138 ± 0.369	0.038 ± 0.079	0.132 ± 0.106	0.107 ± 0.077	0.267 ± 0.082
4	0.344 ± 0.232	0.087 ± 0.105	0.073 ± 0.104	1.142 ± 0.420	0.046 ± 0.026	0.151 ± 0.142	0.110 ± 0.074	0.273 ± 0.212
5	0.979 ± 0.425	0.116 ± 0.227	0.077 ± 0.079	1.372 ± 0.471	0.068 ± 0.053	0.189 ± 0.371	0.152 ± 0.088	0.332 ± 0.113
6	0.451 ± 0.197	0.121 ± 0.246	0.091 ± 0.187	1.201 ± 0.436	0.062 ± 0.093	0.149 ± 0.189	0.147 ± 0.124	0.279 ± 0.085

航迹	PPO-SAG (S)	PPO-SAG (M)	PID
1	0.084 ± 0.174	0.120 ± 0.070	0.254 ± 0.096
2	0.054 ± 0.218	0.104 ± 0.116	0.231 ± 0.100
3	0.086 ± 0.252	0.159 ± 0.140	0.273 ± 0.103
4	0.105 ± 0.276	0.162 ± 0.181	0.293 ± 0.185
5	0.125 ± 0.289	0.238 ± 0.285	0.371 ± 0.472
6	0.131 ± 0.335	0.205 ± 0.245	0.289 ± 0.137

[1]	Wang Xiang, Tan Guozhen. Research on Decision-making of Autonomous Driving in Highway Environment Based on Knowledge and Large Language Model [J]. Journal of System Simulation, 2025, 37(5): 1246-1255.
[2]	Li Jie, Liu Yang, Li Liang, Su Bengan, Wei Jialong, Zhou Guangda, Shi Yanmin, Zhao Zhen. Remote Sensing Small Object Detection Based on Cross-stage Two-branch Feature Aggregation [J]. Journal of System Simulation, 2025, 37(4): 1025-1040.
[3]	Zheng Lanyue, Zhang Yujie. Traffic Signal Detection Based on Improved YOLOv7 [J]. Journal of System Simulation, 2025, 37(4): 993-1007.
[4]	Li Xiang, Ren Xiaoyu, Zhou Yongbing, Zhang Jian. Research on Flexible Integrated Scheduling Under Stochastic Processing Times Based on Improved D3QN Algorithm [J]. Journal of System Simulation, 2025, 37(2): 474-486.
[5]	Fei Shuaidi, Cai Changlong, Liu Fei, Chen Minghui, Liu Xiaoming. Research on the Target Allocation Method for Air Defense and Anti-missile Defense of Naval Ships [J]. Journal of System Simulation, 2025, 37(2): 508-516.
[6]	Jiang Jiachen, Jia Zhengxuan, Xu Zhao, Lin Tingyu, Zhao Pengpeng, Ou Yiming. Decision Modeling and Solution Based on Game Adversarial Complex Systems [J]. Journal of System Simulation, 2025, 37(1): 66-78.
[7]	Li Chao, Li Jiabao, Ding Caichang, Ye Zhiwei, Zuo Fangwei. Edge Surveillance Task Offloading and Resource Allocation Algorithm Based on DRL [J]. Journal of System Simulation, 2024, 36(9): 2113-2126.
[8]	Liu Peijin, Fu Xuefeng, Sun Haofeng, He Lin, Liu Shujie. A Highly Robust Target Tracking Algorithm Merging CNN and Transformer [J]. Journal of System Simulation, 2024, 36(8): 1854-1868.
[9]	Lu Yang, Liu Pengfei, Xu Siyuan, Liu Qiwang, Gu Fuqian, Wang Peng. Simulation of Rice Disease Recognition Based on Improved Attention Mechanism Embedded in PR-Net Model [J]. Journal of System Simulation, 2024, 36(6): 1322-1333.
[10]	Liu Jinhui, Chen Mengyuan, Han Pengpeng, Chen Hebao, Zhang Yukun. A Graph Neural Network Visual SLAM Algorithm for Large-angle View Motion [J]. Journal of System Simulation, 2024, 36(5): 1043-1060.
[11]	Qin Baoxin, Zhang Yuxiao, Wu Sirui, Cao Weichong, Li Zhan. Intelligent Optimization of Coal Terminal Unloading Scheduling Based on Improved D3QN Algorithm [J]. Journal of System Simulation, 2024, 36(3): 770-781.
[12]	Li Ming, Ye Wangzhong, Yan Jiehua. Path Planning of Desert Robot Based on Deep Reinforcement Learning [J]. Journal of System Simulation, 2024, 36(12): 2917-2925.
[13]	Wu Yunpeng, Fu Yingxiong, Shen Lijun, Cui Feng. Traffic Sign Recognition Model with Long-Tail Distribution Based on YOLOX-Tiny [J]. Journal of System Simulation, 2024, 36(11): 2503-2516.
[14]	Xu Zhongkai, Liu Yanling, Sheng Xiaojuan, Wang Chao, Ke Wenjun. Automatic Detection Algorithm for Typical Defects of Substation Based on Improved YOLOv5 [J]. Journal of System Simulation, 2024, 36(11): 2604-2615.
[15]	Lu Bin, Wang Minghan, Sun Yang, Yang Zhenyu. Global-local Fusion for Efficient 3D Object Detection [J]. Journal of System Simulation, 2024, 36(11): 2616-2630.