Journal of System Simulation ›› 2025, Vol. 37 ›› Issue (5): 1169-1187.doi: 10.16182/j.issn1004731x.joss.24-0025
• Papers • Previous Articles Next Articles
Wu Guohua1, Zeng Jiaheng2, Wang Dezhi3, Zheng Long4, Zou Wei5
Received:
2024-01-08
Revised:
2024-03-12
Online:
2025-05-20
Published:
2025-05-23
Contact:
Wang Dezhi
CLC Number:
Wu Guohua, Zeng Jiaheng, Wang Dezhi, Zheng Long, Zou Wei. A Quadrotor Trajectory Tracking Control Method Based on Deep Reinforcement Learning[J]. Journal of System Simulation, 2025, 37(5): 1169-1187.
Table 4
Comparison of tracking errors for multiple algorithms within training little disturbance
航迹 | SAC(S) | PPO(S) | PPO-PPD(S) | PPO-SAG(P) | PPO-SAG(S) | PPO-SAG (FC-M) | PPO-SAG(M) | PID |
---|---|---|---|---|---|---|---|---|
1 | 0.196 ± 0.095 | 0.072 ± 0.036 | 0.061 ± 0.061 | 1.055 ± 0.430 | 0.054 ± 0.037 | 0.092 ± 0.064 | 0.087 ± 0.052 | 0.248 ± 0.073 |
2 | 0.473 ± 0.198 | 0.048 ± 0.032 | 0.038 ± 0.032 | 1.024 ± 0.459 | 0.030 ± 0.021 | 0.087 ± 0.074 | 0.075 ± 0.048 | 0.227 ± 0.078 |
3 | 0.404 ± 0.128 | 0.092 ± 0.208 | 0.064 ± 0.084 | 1.138 ± 0.369 | 0.038 ± 0.079 | 0.132 ± 0.106 | 0.107 ± 0.077 | 0.267 ± 0.082 |
4 | 0.344 ± 0.232 | 0.087 ± 0.105 | 0.073 ± 0.104 | 1.142 ± 0.420 | 0.046 ± 0.026 | 0.151 ± 0.142 | 0.110 ± 0.074 | 0.273 ± 0.212 |
5 | 0.979 ± 0.425 | 0.116 ± 0.227 | 0.077 ± 0.079 | 1.372 ± 0.471 | 0.068 ± 0.053 | 0.189 ± 0.371 | 0.152 ± 0.088 | 0.332 ± 0.113 |
6 | 0.451 ± 0.197 | 0.121 ± 0.246 | 0.091 ± 0.187 | 1.201 ± 0.436 | 0.062 ± 0.093 | 0.149 ± 0.189 | 0.147 ± 0.124 | 0.279 ± 0.085 |
Table 6
Track errors comparison in large disturbance
航迹 | PPO-SAG (S) | PPO-SAG (M) | PID |
---|---|---|---|
1 | 0.084 ± 0.174 | 0.120 ± 0.070 | 0.254 ± 0.096 |
2 | 0.054 ± 0.218 | 0.104 ± 0.116 | 0.231 ± 0.100 |
3 | 0.086 ± 0.252 | 0.159 ± 0.140 | 0.273 ± 0.103 |
4 | 0.105 ± 0.276 | 0.162 ± 0.181 | 0.293 ± 0.185 |
5 | 0.125 ± 0.289 | 0.238 ± 0.285 | 0.371 ± 0.472 |
6 | 0.131 ± 0.335 | 0.205 ± 0.245 | 0.289 ± 0.137 |
1 | 伍国华, 毛妮, 徐彬杰, 等. 基于自适应大规模邻域搜索算法的多车辆与多无人机协同配送方法[J]. 控制与决策, 2023, 38(1): 201-210. |
Wu Guohua, Mao Ni, Xu Binjie, et al. The Cooperative Delivery of Multiple Vehicles and Multiple Drones Based on Adaptive Large Neighborhood Search[J]. Control and Decision, 2023, 38(1): 201-210. | |
2 | AlMahamid F, Grolinger K. Autonomous Unmanned Aerial Vehicle Navigation Using Reinforcement Learning: A Systematic Review[J]. Engineering Applications of Artificial Intelligence, 2022, 115: 105321. |
3 | Xue Wentao, Wu Hangxing, Ye Hui, et al. An Improved Proximal Policy Optimization Method for Low-level Control of a Quadrotor[J]. Actuators, 2022, 11(4): 105. |
4 | Lee T, Leok M, McClamroch N H. Geometric Tracking Control of a Quadrotor UAV on SE(3)[C]//49th IEEE conference on decision and control (CDC). Piscataway: IEEE, 2010: 5420-5425. |
5 | Kamel Mina, Burri Michael, Siegwart Roland. Linear vs Nonlinear MPC for Trajectory Tracking Applied to Rotary Wing Micro Aerial Vehicles[J]. IFAC-PapersOnLine, 2017, 50(1): 3463-3469. |
6 | Pi Chenhuan, Ye Weiyuan, Cheng S. Robust Quadrotor Control Through Reinforcement Learning with Disturbance Compensation[J]. Applied Sciences, 2021, 11(7): 3257. |
7 | Lambert N O, Drew D S, Yaconelli J, et al. Low-level Control of a Quadrotor with Deep Model-based Reinforcement Learning[J]. IEEE Robotics and Automation Letters, 2019, 4(4): 4224-4230. |
8 | 董豪, 杨静, 李少波, 等. 基于深度强化学习的机器人运动控制研究进展[J]. 控制与决策, 2022, 37(2): 278-292. |
Dong Hao, Yang Jing, Li Shaobo, et al. Research Progress of Robot Motion Control Based on Deep Reinforcement Learning[J]. Control and Decision, 2022, 37(2): 278-292. | |
9 | Hwangbo Jemin, Sa Inkyu, Siegwart Roland, et al. Control of a Quadrotor with Reinforcement Learning[J]. IEEE Robotics and Automation Letters, 2017, 2(4): 2096-2103. |
10 | Koch W, Mancuso R, West R, et al. Reinforcement Learning for UAV Attitude Control[J]. ACM Transactions on Cyber-Physical Systems, 2019, 3(2): 22. |
11 | Koch W, Mancuso R, Bestavros A. Neuroflight: Next Generation Flight Control Firmware[EB/OL]. (2019-09-16) [2022-10-06]. . |
12 | Guilherme Cano Lopes, Ferreira Murillo, Alexandre da Silva Simões, et al. Intelligent Control of a Quadrotor with Proximal Policy Optimization Reinforcement Learning[C]//2018 Latin American Robotic Symposium, 2018 Brazilian Symposium on Robotics (SBR) and 2018 Workshop on Robotics in Education (WRE). Piscataway: IEEE, 2018: 503-508. |
13 | Shehab Mazen, Zaghloul Ahmed, El-Badawy Ayman. Low-level Control of a Quadrotor Using Twin Delayed Deep Deterministic Policy Gradient (TD3)[C]//2021 18th International Conference on Electrical Engineering, Computing Science and Automatic Control (CCE). Piscataway: IEEE, 2021: 1-6. |
14 | Gabriel Moraes Barros, Esther Luna Colombini. Using Soft Actor-critic for Low-level UAV Control[EB/OL]. (2020-10-05) [2023-10-06]. . |
15 | Wang Yuanda, Sun Jia, He Haibo, et al. Deterministic Policy Gradient with Integral Compensator for Robust Quadrotor Control[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2020, 50(10): 3713-3725. |
16 | Barzegar Ali, Jin Lee Deok. Deep Reinforcement Learning-based Adaptive Controller for Trajectory Tracking and Altitude Control of an Aerial Robot[J]. Applied Sciences, 2022, 12(9): 4764. |
17 | 梁吉, 王立松, 黄昱洲, 等. 基于深度强化学习的四旋翼无人机自主控制方法[J]. 计算机科学, 2023, 50(增2): 1-7. |
Liang Ji, Wang Lisong, Huang Yuzhou, et al. Autonomous Control Algorithm for Quadrotor Based on Deep Reinforcement Learning[J]. Computer Science, 2023, 50(S2): 1-7. | |
18 | 王伟, 吴昊, 刘鸿勋, 等. 基于深度强化学习的无人机姿态控制器设计[J]. 科学技术与工程, 2023, 23(34): 14888-14895. |
Wang Wei, Wu Hao, Liu Hongxun, et al. An Attitude Controller for Quadrotor Drone Using RM-DDPG[J]. Science Technology and Engineering, 2023, 23(34): 14888-14895. | |
19 | 孙丹, 高东, 郑建华, 等. 引入积分补偿的四旋翼确定性策略梯度控制器[J]. 计算机工程与设计, 2023, 44(1): 255-261. |
Sun Dan, Gao Dong, Zheng Jianhua, et al. Deterministic Policy Gradient Controller with integral compensator for quadrotor[J]. Computer Engineering and Design, 2023, 44(1): 255-261. | |
20 | 杨志鹏, 李波, 甘志刚, 等. 基于深度强化学习的四旋翼无人机航线跟随[J]. 指挥与控制学报, 2022, 8(4): 477-482. |
Yang Zhipeng, Li Bo, Gan Zhigang, et al. Route Following of Quadrotor UAV Based on Deep Reinforcement Learning[J]. Journal of Command and Control, 2022, 8(4): 477-482. | |
21 | 孙丹, 高东, 郑建华, 等. 示教知识辅助的无人机强化学习控制算法[J]. 北京航空航天大学学报, 2023, 49(6): 1424-1433. |
Sun Dan, Gao Dong, Zheng Jianhua, et al. UAV Reinforcement Learning Control Algorithm with Demonstrations[J]. Journal of Beijing University of Aeronautics and Astronautics, 2023, 49(6): 1424-1433. | |
22 | 刘安林, 时正华. 基于DDPG策略的四旋翼飞行器目标高度控制[J]. 陕西科技大学学报, 2021, 39(6): 141-147. |
Liu Anlin, Shi Zhenghua. Desired Height Control of Quadrotor Vehicle Based on DDPG Strategy[J]. Journal of Shaanxi University of Science & Technology, 2021, 39(6): 141-147. | |
23 | Molchanov A, Chen Tao, Hönig Wolfgang, et al. Sim-to-(multi)-real: Transfer of Low-level Robust Control Policies to Multiple Quadrotors[C]//2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Piscataway: IEEE, 2019: 59-66. |
24 | Kaufmann Elia, Bauersfeld Leonard, Scaramuzza Davide. A Benchmark Comparison of Learned Control Policies for Agile Quadrotor Flight[C]//2022 International Conference on Robotics and Automation (ICRA). Piscataway: IEEE, 2022: 10504-10510. |
25 | Song Yunlong, Steinweg Mats, Kaufmann Elia, et al. Autonomous Drone Racing with Deep Reinforcement Learning[C]//2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Piscataway: IEEE, 2021: 1205-1212. |
26 | Penicka Robert, Song Yunlong, Kaufmann Elia, et al. Learning Minimum-time Flight in Cluttered Environments[J]. IEEE Robotics and Automation Letters, 2022, 7(3): 7209-7216. |
27 | Wu Guohua, Mao Ni, Luo Qizhang, et al. Collaborative Truck-drone Routing for Contactless Parcel Delivery During the Epidemic[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(12): 25077-25091. |
28 | Xu Binjie, Zhao Kexin, Luo Qizhang, et al. A GV-drone Arc Routing Approach for Urban Traffic Patrol by Coordinating a Ground Vehicle and Multiple Drones[J]. Swarm and Evolutionary Computation, 2023, 77: 101246. |
29 | Faessler Matthias, Franchi Antonio, Scaramuzza Davide. Differential Flatness of Quadrotor Dynamics Subject to Rotor Drag for Accurate Tracking of High-speed Trajectories[J]. IEEE Robotics and Automation Letters, 2018, 3(2): 620-626. |
30 | Hart P E, Nilsson N J, Raphael B. A Formal Basis for the Heuristic Determination of Minimum Cost Paths[J]. IEEE Transactions on Systems Science and Cybernetics, 1968, 4(2): 100-107. |
31 | Mellinger D, Kumar V. Minimum Snap Trajectory Generation and Control for Quadrotors[C]//2011 IEEE international conference on robotics and automation. Piscataway: IEEE, 2011: 2520-2525. |
32 | Kirkpatrick S, Gelatt C D Jr, Vecchi M P. Optimization by Simulated Annealing[J]. Science, 1983, 220(4598): 671-680. |
33 | Schulman J, Levine S, Moritz P, et al. Trust Region Policy Optimization[C]//Proceedings of the 32nd International Conference on International Conference on Machine Learning. Chia Laguna Resort: PMLR, 2015: 1889-1897. |
34 | Williams R J. Simple Statistical Gradient-following Algorithms for Connectionist Reinforcement Learning[J]. Machine Learning, 1992, 8(3): 229-256. |
35 | Schulman J, Moritz P, Levine S, et al. High-dimensional Continuous Control Using Generalized Advantage Estimation[EB/OL]. (2018-10-20) [2023-10-06]. . |
36 | Schulman J, Wolski F, Dhariwal P, et al. Proximal Policy Optimization Algorithms[EB/OL]. (2017-08-28) [2023-10-06]. . |
37 | Ilyas A, Engstrom L, Santurkar S, et al. Are Deep Policy Gradient Algorithms Truly Policy Gradient Algorithms?[EB/OL]. (2020-05-25) [2023-10-06]. . |
38 | Chu Xiangxiang. Policy Optimization with Penalized Point Probability Distance: An Alternative to Proximal Policy Optimization[EB/OL]. (2019-02-14) [2023-10-06]. . |
39 | Haarnoja T, Tang Haoran, Abbeel P, et al. Reinforcement Learning with Deep Energy-based Policies[C]//Proceedings of the 34th International Conference on Machine Learning. Chia Laguna Resort: PMLR, 2017: 1352-1361. |
40 | Tucker G, Bhupatiraju S, Gu Shixiang, et al. The Mirage of Action-dependent Baselines in Reinforcement Learning[C]//Proceedings of the 35th International Conference on Machine Learning. Chia Laguna Resort: PMLR, 2018: 5015-5024. |
41 | Engstrom L, Ilyas A, Santurkar S, et al. Implementation Matters in Deep Policy Gradients: A Case Study on PPO and TRPO[EB/OL]. (2020-05-25) [2023-10-06]. . |
42 | Vaswani A, Shazeer N, Parmar N, et al. Attention Is All You Need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 6000-6010. |
43 | Kingma D P, Ba J. Adam: A Method for Stochastic Optimization[EB/OL]. (2017-01-30) [2023-10-06]. . |
44 | Rohmer Eric, Singh S P N, Freese Marc. V-REP: A Versatile and Scalable Robot Simulation Framework[C]//2013 IEEE/RSJ international conference on intelligent robots and systems. Piscataway: IEEE, 2013: 1321-1326. |
45 | James S, Freese M, Davison A J. PyRep: Bringing V-REP to Deep Robot Learning[EB/OL]. (2019-06-26) [2023-10-06]. . |
46 | Förster Julian. System Identification of the Crazyflie 2.0 Nano Quadrocopter[D]. Zurich: ETH Zurich, 2015. |
[1] | Wang Xiang, Tan Guozhen. Research on Decision-making of Autonomous Driving in Highway Environment Based on Knowledge and Large Language Model [J]. Journal of System Simulation, 2025, 37(5): 1246-1255. |
[2] | Li Jie, Liu Yang, Li Liang, Su Bengan, Wei Jialong, Zhou Guangda, Shi Yanmin, Zhao Zhen. Remote Sensing Small Object Detection Based on Cross-stage Two-branch Feature Aggregation [J]. Journal of System Simulation, 2025, 37(4): 1025-1040. |
[3] | Zheng Lanyue, Zhang Yujie. Traffic Signal Detection Based on Improved YOLOv7 [J]. Journal of System Simulation, 2025, 37(4): 993-1007. |
[4] | Li Xiang, Ren Xiaoyu, Zhou Yongbing, Zhang Jian. Research on Flexible Integrated Scheduling Under Stochastic Processing Times Based on Improved D3QN Algorithm [J]. Journal of System Simulation, 2025, 37(2): 474-486. |
[5] | Fei Shuaidi, Cai Changlong, Liu Fei, Chen Minghui, Liu Xiaoming. Research on the Target Allocation Method for Air Defense and Anti-missile Defense of Naval Ships [J]. Journal of System Simulation, 2025, 37(2): 508-516. |
[6] | Jiang Jiachen, Jia Zhengxuan, Xu Zhao, Lin Tingyu, Zhao Pengpeng, Ou Yiming. Decision Modeling and Solution Based on Game Adversarial Complex Systems [J]. Journal of System Simulation, 2025, 37(1): 66-78. |
[7] | Li Chao, Li Jiabao, Ding Caichang, Ye Zhiwei, Zuo Fangwei. Edge Surveillance Task Offloading and Resource Allocation Algorithm Based on DRL [J]. Journal of System Simulation, 2024, 36(9): 2113-2126. |
[8] | Liu Peijin, Fu Xuefeng, Sun Haofeng, He Lin, Liu Shujie. A Highly Robust Target Tracking Algorithm Merging CNN and Transformer [J]. Journal of System Simulation, 2024, 36(8): 1854-1868. |
[9] | Lu Yang, Liu Pengfei, Xu Siyuan, Liu Qiwang, Gu Fuqian, Wang Peng. Simulation of Rice Disease Recognition Based on Improved Attention Mechanism Embedded in PR-Net Model [J]. Journal of System Simulation, 2024, 36(6): 1322-1333. |
[10] | Liu Jinhui, Chen Mengyuan, Han Pengpeng, Chen Hebao, Zhang Yukun. A Graph Neural Network Visual SLAM Algorithm for Large-angle View Motion [J]. Journal of System Simulation, 2024, 36(5): 1043-1060. |
[11] | Qin Baoxin, Zhang Yuxiao, Wu Sirui, Cao Weichong, Li Zhan. Intelligent Optimization of Coal Terminal Unloading Scheduling Based on Improved D3QN Algorithm [J]. Journal of System Simulation, 2024, 36(3): 770-781. |
[12] | Li Ming, Ye Wangzhong, Yan Jiehua. Path Planning of Desert Robot Based on Deep Reinforcement Learning [J]. Journal of System Simulation, 2024, 36(12): 2917-2925. |
[13] | Wu Yunpeng, Fu Yingxiong, Shen Lijun, Cui Feng. Traffic Sign Recognition Model with Long-Tail Distribution Based on YOLOX-Tiny [J]. Journal of System Simulation, 2024, 36(11): 2503-2516. |
[14] | Xu Zhongkai, Liu Yanling, Sheng Xiaojuan, Wang Chao, Ke Wenjun. Automatic Detection Algorithm for Typical Defects of Substation Based on Improved YOLOv5 [J]. Journal of System Simulation, 2024, 36(11): 2604-2615. |
[15] | Lu Bin, Wang Minghan, Sun Yang, Yang Zhenyu. Global-local Fusion for Efficient 3D Object Detection [J]. Journal of System Simulation, 2024, 36(11): 2616-2630. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||