Adaptive PID Control Algorithm Based on PPO

doi:10.16182/j.issn1004731x.joss.23-0137

Abstract

Abstract:

A six-axis robotic arm is built and simulated in a complex control environment with disturbances by using MATLAB physics engine and Python, which provides a trial-and-error environment for the robotic arm training that could not be provided in reality. Proximal policy optimization(PPO) algorithm in reinforcement learning is proposed to improve the traditional PID control algorithm. By introducing the multi-agent idea and on the basis of the different effects of the three parameters of PID on control system and the characteristics of the six-axis robotic arm, the three parameters are separately trained as different intelligent individuals to achieve a new multi-agent adaptive PID algorithm with multi-agent adaptive adjustment of parameters. Simulation results show that the algorithm outperforms MA-DDPG and MA-SAC algorithms in training convergence. Compared with the traditional PID algorithm, the algorithm can effectively suppress the disturbances and oscillations, and has lower overshoot and adjustment time, which makes the control process smoother and effectively improves the control accuracy of the robotic arm. The robustness and effectiveness is proved.

Key words: RL, PPO algorithm, adaptive PID tuning, robotic arm, multi-agent

CLC Number:

TP242.2

Zhou Zhiyong, Mo Fei, Zhao Kai, Hao Yunbo, Qian Yufeng. Adaptive PID Control Algorithm Based on PPO[J]. Journal of System Simulation, 2024, 36(6): 1425-1432.

Figures/Tables 14

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Fig. 6

Fig. 7

Fig. 8

Fig. 9

Table 1

Fig. 10

Fig. 11

Fig. 12

Fig. 13

References 21

1	杜宝林, 朱大昌, 盘意华. 机械臂模糊超螺旋二阶滑模轨迹跟踪控制[J]. 系统仿真学报, 2022, 34(6): 1343-1352.
	Du Baolin, Zhu Dachang, Pan Yihua. Fuzzy Super-twisting Second Order Sliding Mode Trajectory Tracking Control for Robotic Manipulator[J]. Journal of System Simulation, 2022, 34(6): 1343-1352.
2	张瑞民, 陈巧玉. 基于光滑二阶滑模的机械臂轨迹跟踪控制[J]. 系统仿真学报, 2021, 33(6): 1315-1322.
	Zhang Ruimin, Chen Qiaoyu. Trajectory Tracking Control of Robotic Manipulators Based on Smooth Second-order Sliding Mode[J]. Journal of System Simulation, 2021, 33(6): 1315-1322.
3	Wu Jingda, He Hongwen, Peng Jiankun, et al. Continuous Reinforcement Learning of Energy Management with Deep Q Network for a Power Split Hybrid Electric Bus[J]. Applied Energy, 2018, 222: 799-811.
4	Schulman J, Levine S, Moritz P, et al. Trust Region Policy Optimization[C]//Proceedings of the 32nd International Conference on International Conference on Machine Learning. Chia Laguna Resort, Sardinia, Italy: PMLR, 2015: 1889-1897.
5	Zhang Yao, Deng Zhongliang, Gao Yuhui. Angle of Arrival Passive Location Algorithm Based on Proximal Policy Optimization[J]. Electronics, 2019, 8(12): 1558.
6	Haarnoja T, Zhou A, Abbeel P, et al. Soft Actor-critic: Off-policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor[C]//Proceedings of the 35th International Conference on Machine Learning. Chia Laguna Resort, Sardinia, Italy: PMLR, 2018: 3008-3018.
7	Morales E F, Zaragoza J H. An Introduction to Reinforcement Learning[M]. IEEE, 2011, 11(4): 219-354.
8	Nguyen Cong Luong, Dinh Thai Hoang, Gong Shimin, et al. Applications of Deep Reinforcement Learning in Communications and Networking: A Survey[J]. IEEE Communications Surveys & Tutorials, 2019, 21(4): 3133-3174.
9	李鹤宇, 赵志龙, 顾蕾, 等. 基于深度强化学习的机械臂控制方法[J]. 系统仿真学报, 2019, 31(11): 2452-2457.
	Li Heyu, Zhao Zhilong, Gu Lei, et al. Robot Arm Control Method Based on Deep Reinforcement Learning[J]. Journal of System Simulation, 2019, 31(11): 2452-2457.
10	江达, 蔡志勤, 刘忠振, 等. 基于强化学习的连续型机械臂自适应跟踪控制[J]. 系统仿真学报, 2022, 34(10): 2264-2271.
	Jiang Da, Cai Zhiqin, Liu Zhongzhen, et al. Reinforcement-learning-based Adaptive Tracking Control for a Space Continuum Robot Based on Reinforcement Learning[J]. Journal of System Simulation, 2022, 34(10): 2264-2271.
11	Elsisi Mahmoud, Mahmoud Karar, Lehtonen Matti, et al. An Improved Neural Network Algorithm to Efficiently Track Various Trajectories of Robot Manipulator Arms[J]. IEEE Access, 2021, 9: 11911-11920.
12	Tran Duc-Thien, Truong Hoai-Vu-Anh, Kyoung Kwan Ahn. Adaptive Nonsingular Fast Terminal Sliding Mode Control of Robotic Manipulator Based Neural Network Approach[J]. International Journal of Precision Engineering and Manufacturing, 2021, 22(3): 417-429.
13	Yang Shichun, Xie Hehui, Chen Fei, et al. Research on Manipulator Trajectory Tracking Based on Adaptive Fuzzy Sliding Mode Control[C]//2020 Chinese Automation Congress (CAC). Piscataway, NJ, USA: IEEE, 2020: 3086-3091.
14	Ahmed Saim, Wang Haoping, Tian Yang. Adaptive High-order Terminal Sliding Mode Control Based on Time Delay Estimation for the Robotic Manipulators with Backlash Hysteresis[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2021, 51(2): 1128-1137.
15	Ma Yajun, Zhao Hui, Li Tao. Robust Adaptive Dual Layer Sliding Mode Controller: Methodology and Application of Uncertain Robot Manipulator[J]. Transactions of the Institute of Measurement and Control, 2022, 44(4): 848-860.
16	Mohammadi F, Mohammadi-Ivatloo B, Gharehpetian G B, et al. Robust Control Strategies for Microgrids: A Review[J]. IEEE Systems Journal, 2022, 16(2): 2401-2412.
17	Konar Amit, Indrani Goswami Chakraborty, Sapam Jitu Singh, et al. A Deterministic Improved Q-learning for Path Planning of a Mobile Robot[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2013, 43(5): 1141-1153.
18	Zhou Changjiu, Meng Qingchun. Dynamic Balance of a Biped Robot Using Fuzzy Reinforcement Learning Agents[J]. Fuzzy Sets and Systems, 2003, 134(1): 169-187.
19	Wu Hui, Song Shiji, You Keyou, et al. Depth Control of Model-free AUVs via Reinforcement Learning[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2019, 49(12): 2499-2510.
20	魏楠哲. 空间机械臂柔性关节高精度控制研究[D]. 北京: 北京邮电大学, 2016.
	Wei Nanzhe. Study on Flexible Joint Control System with High Precision for Space Manipulator[D]. Beijing: Beijing University of Posts and Telecommunications, 2016.
21	Schulman J, Wolski F, Dhariwal P, et al. Proximal Policy Optimization Algorithms[EB/OL]. (2017-08-28) [2023-01-12]. .

参数名称	参数解释	参数取值范围	本文取值
epsilon	PPO-clip算用于控制策略更新时新策略和旧策略的差异范围	0.1~0.3	0.2
learning rate	神经网络优化器的学习率，用于控制神经网络权重的更新速度	0.000 01~0.001	0.000 01
batch size	每个训练步骤中采样的样本数	64~512	320
buffer_size	收集的经验数，包含观测、行为与奖励用于后续训练	2 048~409 600	2 400
clip range	PPO-clip算法中用于控制策略更新步长的截断范围	0.1~0.3	0.25
Value function coefficient	价值函数在总损失函数中的权重系数	0.5~1.0	0.7
entropy coefficien	策略的熵在总损失函数中的权重系数，用于探索	0.001~0.01	0.01

[1]	Zhu Zilu, Liu Yongkui, Zhang Lin, Wang Lihui, Lin Tingyu. Simulation of Robotic Peg-in-hole Assembly Strategy Based on DRL [J]. Journal of System Simulation, 2024, 36(6): 1414-1424.
[2]	Wang Hongjun, Lin Junqiang, Zou Xiangjun, Zhang Po, Zhou Mingxuan, Zou Weirui, Tang Yunchao, Luo Lufeng. Construction of a Virtual Interactive System for Orchards Based on Digital Twin [J]. Journal of System Simulation, 2024, 36(6): 1493-1508.
[3]	Wang Yuan, Xu Lin, Gong Xiaoze, Zhang Yongliang, Wang Yongli. Gradient-based Deep Reinforcement Learning Interpretation Methods [J]. Journal of System Simulation, 2024, 36(5): 1130-1140.
[4]	Yan Xingyu, Li Dayan, Wang Niya, Zhang Kaixiang, Mao Jianlin. Multi-agent Path Planning with Obstacle Penalty Factor [J]. Journal of System Simulation, 2024, 36(3): 673-685.
[5]	Zhao Yingying, Dong Pusen, Zhu Tianchen, Li Fan, Su Yun, Tai Zhenying, Sun Qingyun, Fan Hang. Efficiency Optimization Method for Data Sampling in Power Grid Topology Scheduling Simulation [J]. Journal of System Simulation, 2024, 36(2): 283-295.
[6]	Wang Xinpeng, Fu Huiqiao, Deng Guizhou, Tang Kaiqiang, Chen Chunlin, Liu Canghai. Research on Motion Planning of Hexapod Robot Based on DRL and Free Gait [J]. Journal of System Simulation, 2024, 36(2): 373-384.
[7]	Pan Hainan, Chen Bailiang, Huang Kaihong, Ren Junkai, Cheng Chuang, Lu Huimin, Zhang Hui. Flipper Control Method for Tracked Robot Based on Deep Reinforcement Learning [J]. Journal of System Simulation, 2024, 36(2): 405-414.
[8]	Zhang Guohui, Gao Ang, Zhang Ya'nan. Combat Effectiveness Evaluation Method of Homogeneous Cluster Equipment System Based on RLoMAG+EAS [J]. Journal of System Simulation, 2024, 36(1): 160-169.
[9]	An Jing, Si Guangya, Zhang Lei. Strategy Optimization Method of Multi-dimension Projection Based on Deep Reinforcement Learning [J]. Journal of System Simulation, 2024, 36(1): 39-49.
[10]	Xiaofeng Wang, Taiqian Shen, Yuan Liu. Research on Dynamic Simulation Technology for Satellite Internet [J]. Journal of System Simulation, 2023, 35(7): 1472-1486.
[11]	Miaomiao Ma, Lipeng Dong, Xiangjie Liu. Energy Management Strategy of Multi-agent Microgrid Based on Q-learning Algorithm [J]. Journal of System Simulation, 2023, 35(7): 1487-1496.
[12]	Fei Ding, Meinan Zhang, Hengheng Zhuang, Hairong Ma, Dengyin Zhang. Target Search Planning and Algorithm for Monitoring of Polar Disaster Areas [J]. Journal of System Simulation, 2023, 35(7): 1526-1538.
[13]	Chengbing Li, Yunfei Li, Peng Wu. Dynamic Simulation of Urban Agglomeration Passenger Transport Network Vulnerability Based on Multi-agent [J]. Journal of System Simulation, 2023, 35(6): 1183-1190.
[14]	Yandong Liu, Gaoxiang Huang, Wen Chen. Improved Social Force Model Based on Enhancing Psych behavioral Heterogeneity [J]. Journal of System Simulation, 2023, 35(5): 1120-1130.
[15]	Hongliang Zhang, Jingru Xu, Bo Tan, Gongjie Xu. Dual Resource Constrained Flexible Job Shop Energy-saving Scheduling Considering Delivery Time [J]. Journal of System Simulation, 2023, 35(4): 734-746.