基于PPO的自适应PID控制算法研究

doi:10.16182/j.issn1004731x.joss.23-0137

摘要/Abstract

摘要：

采用MATLAB物理引擎联合Python搭建了一个六轴机械臂，并模拟带有扰动的复杂控制环境，为机械臂训练提供现实中无法提供的试错环境。使用强化学习中近端优化算法(proximal policy optimization，PPO)算法对传统PID控制算法进行改进，引入多智能体思想，根据PID三个参数对控制系统的不同影响及六轴机械臂的特性，将三个参数分别作为不同的智能个体进行训练，实现多智能体自适应调整参数的新型多智能体自适应PID算法。仿真结果表明：该算法的训练收敛性优于MA-DDPG与MA-SAC算法，与传统PID算法的控制效果相比，在遇到扰动及振荡的情况下，能够更有效地抑制振荡，并具有更低的超调量和调整时间，控制过程更为平缓，有效提高了机械臂的控制精度，证明了该算法的鲁棒性及有效性。

关键词: 强化学习, 近端优化算法, 自适应PID整定, 机械臂, 多智能体

Abstract:

A six-axis robotic arm is built and simulated in a complex control environment with disturbances by using MATLAB physics engine and Python, which provides a trial-and-error environment for the robotic arm training that could not be provided in reality. Proximal policy optimization(PPO) algorithm in reinforcement learning is proposed to improve the traditional PID control algorithm. By introducing the multi-agent idea and on the basis of the different effects of the three parameters of PID on control system and the characteristics of the six-axis robotic arm, the three parameters are separately trained as different intelligent individuals to achieve a new multi-agent adaptive PID algorithm with multi-agent adaptive adjustment of parameters. Simulation results show that the algorithm outperforms MA-DDPG and MA-SAC algorithms in training convergence. Compared with the traditional PID algorithm, the algorithm can effectively suppress the disturbances and oscillations, and has lower overshoot and adjustment time, which makes the control process smoother and effectively improves the control accuracy of the robotic arm. The robustness and effectiveness is proved.

Key words: RL, PPO algorithm, adaptive PID tuning, robotic arm, multi-agent

中图分类号:

TP242.2

周志勇,莫非,赵凯等 . 基于PPO的自适应PID控制算法研究[J]. 系统仿真学报, 2024, 36(6): 1425-1432.

Zhou Zhiyong,Mo Fei,Zhao Kai,et al . Adaptive PID Control Algorithm Based on PPO[J]. Journal of System Simulation, 2024, 36(6): 1425-1432.

图/表 14

图1

图2

图3

图4

图5

图6

图7

图8

图9

表1

图10

图11

图12

图13

参考文献 21

1	杜宝林, 朱大昌, 盘意华. 机械臂模糊超螺旋二阶滑模轨迹跟踪控制[J]. 系统仿真学报, 2022, 34(6): 1343-1352.
	Du Baolin, Zhu Dachang, Pan Yihua. Fuzzy Super-twisting Second Order Sliding Mode Trajectory Tracking Control for Robotic Manipulator[J]. Journal of System Simulation, 2022, 34(6): 1343-1352.
2	张瑞民, 陈巧玉. 基于光滑二阶滑模的机械臂轨迹跟踪控制[J]. 系统仿真学报, 2021, 33(6): 1315-1322.
	Zhang Ruimin, Chen Qiaoyu. Trajectory Tracking Control of Robotic Manipulators Based on Smooth Second-order Sliding Mode[J]. Journal of System Simulation, 2021, 33(6): 1315-1322.
3	Wu Jingda, He Hongwen, Peng Jiankun, et al. Continuous Reinforcement Learning of Energy Management with Deep Q Network for a Power Split Hybrid Electric Bus[J]. Applied Energy, 2018, 222: 799-811.
4	Schulman J, Levine S, Moritz P, et al. Trust Region Policy Optimization[C]//Proceedings of the 32nd International Conference on International Conference on Machine Learning. Chia Laguna Resort, Sardinia, Italy: PMLR, 2015: 1889-1897.
5	Zhang Yao, Deng Zhongliang, Gao Yuhui. Angle of Arrival Passive Location Algorithm Based on Proximal Policy Optimization[J]. Electronics, 2019, 8(12): 1558.
6	Haarnoja T, Zhou A, Abbeel P, et al. Soft Actor-critic: Off-policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor[C]//Proceedings of the 35th International Conference on Machine Learning. Chia Laguna Resort, Sardinia, Italy: PMLR, 2018: 3008-3018.
7	Morales E F, Zaragoza J H. An Introduction to Reinforcement Learning[M]. IEEE, 2011, 11(4): 219-354.
8	Nguyen Cong Luong, Dinh Thai Hoang, Gong Shimin, et al. Applications of Deep Reinforcement Learning in Communications and Networking: A Survey[J]. IEEE Communications Surveys & Tutorials, 2019, 21(4): 3133-3174.
9	李鹤宇, 赵志龙, 顾蕾, 等. 基于深度强化学习的机械臂控制方法[J]. 系统仿真学报, 2019, 31(11): 2452-2457.
	Li Heyu, Zhao Zhilong, Gu Lei, et al. Robot Arm Control Method Based on Deep Reinforcement Learning[J]. Journal of System Simulation, 2019, 31(11): 2452-2457.
10	江达, 蔡志勤, 刘忠振, 等. 基于强化学习的连续型机械臂自适应跟踪控制[J]. 系统仿真学报, 2022, 34(10): 2264-2271.
	Jiang Da, Cai Zhiqin, Liu Zhongzhen, et al. Reinforcement-learning-based Adaptive Tracking Control for a Space Continuum Robot Based on Reinforcement Learning[J]. Journal of System Simulation, 2022, 34(10): 2264-2271.
11	Elsisi Mahmoud, Mahmoud Karar, Lehtonen Matti, et al. An Improved Neural Network Algorithm to Efficiently Track Various Trajectories of Robot Manipulator Arms[J]. IEEE Access, 2021, 9: 11911-11920.
12	Tran Duc-Thien, Truong Hoai-Vu-Anh, Kyoung Kwan Ahn. Adaptive Nonsingular Fast Terminal Sliding Mode Control of Robotic Manipulator Based Neural Network Approach[J]. International Journal of Precision Engineering and Manufacturing, 2021, 22(3): 417-429.
13	Yang Shichun, Xie Hehui, Chen Fei, et al. Research on Manipulator Trajectory Tracking Based on Adaptive Fuzzy Sliding Mode Control[C]//2020 Chinese Automation Congress (CAC). Piscataway, NJ, USA: IEEE, 2020: 3086-3091.
14	Ahmed Saim, Wang Haoping, Tian Yang. Adaptive High-order Terminal Sliding Mode Control Based on Time Delay Estimation for the Robotic Manipulators with Backlash Hysteresis[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2021, 51(2): 1128-1137.
15	Ma Yajun, Zhao Hui, Li Tao. Robust Adaptive Dual Layer Sliding Mode Controller: Methodology and Application of Uncertain Robot Manipulator[J]. Transactions of the Institute of Measurement and Control, 2022, 44(4): 848-860.
16	Mohammadi F, Mohammadi-Ivatloo B, Gharehpetian G B, et al. Robust Control Strategies for Microgrids: A Review[J]. IEEE Systems Journal, 2022, 16(2): 2401-2412.
17	Konar Amit, Indrani Goswami Chakraborty, Sapam Jitu Singh, et al. A Deterministic Improved Q-learning for Path Planning of a Mobile Robot[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2013, 43(5): 1141-1153.
18	Zhou Changjiu, Meng Qingchun. Dynamic Balance of a Biped Robot Using Fuzzy Reinforcement Learning Agents[J]. Fuzzy Sets and Systems, 2003, 134(1): 169-187.
19	Wu Hui, Song Shiji, You Keyou, et al. Depth Control of Model-free AUVs via Reinforcement Learning[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2019, 49(12): 2499-2510.
20	魏楠哲. 空间机械臂柔性关节高精度控制研究[D]. 北京: 北京邮电大学, 2016.
	Wei Nanzhe. Study on Flexible Joint Control System with High Precision for Space Manipulator[D]. Beijing: Beijing University of Posts and Telecommunications, 2016.
21	Schulman J, Wolski F, Dhariwal P, et al. Proximal Policy Optimization Algorithms[EB/OL]. (2017-08-28) [2023-01-12]. .

参数名称	参数解释	参数取值范围	本文取值
epsilon	PPO-clip算用于控制策略更新时新策略和旧策略的差异范围	0.1~0.3	0.2
learning rate	神经网络优化器的学习率，用于控制神经网络权重的更新速度	0.000 01~0.001	0.000 01
batch size	每个训练步骤中采样的样本数	64~512	320
buffer_size	收集的经验数，包含观测、行为与奖励用于后续训练	2 048~409 600	2 400
clip range	PPO-clip算法中用于控制策略更新步长的截断范围	0.1~0.3	0.25
Value function coefficient	价值函数在总损失函数中的权重系数	0.5~1.0	0.7
entropy coefficien	策略的熵在总损失函数中的权重系数，用于探索	0.001~0.01	0.01

[1]	王红军, 林俊强, 邹湘军, 张坡, 周铭轩, 邹伟锐, 唐昀超, 罗陆锋. 基于数字孪生的果园虚拟交互系统构建[J]. 系统仿真学报, 2024, 36(6): 1493-1508.
[2]	王远, 徐琳, 宫小泽, 张永亮, 王永利. 基于梯度的深度强化学习解释方法[J]. 系统仿真学报, 2024, 36(5): 1130-1140.
[3]	秦保新, 张羽霄, 吴思锐, 曹卫冲, 李湛. 基于改进D3QN的煤炭码头卸车排产智能优化方法[J]. 系统仿真学报, 2024, 36(3): 770-781.
[4]	赵莹莹, 董普森, 朱天晨, 李凡, 苏运, 邰振赢, 孙庆赟, 凡航. 面向电网拓扑调度仿真的采样效率优化方法研究[J]. 系统仿真学报, 2024, 36(2): 283-295.
[5]	王鑫鹏, 傅汇乔, 邓归洲, 唐开强, 陈春林, 留沧海. 基于DRL和自由步态的六足机器人运动规划研究[J]. 系统仿真学报, 2024, 36(2): 373-384.
[6]	潘海南, 陈柏良, 黄开宏, 任君凯, 程创, 卢惠民, 张辉. 基于深度强化学习的履带机器人摆臂控制方法[J]. 系统仿真学报, 2024, 36(2): 405-414.
[7]	张国辉, 高昂, 张雅楠. 基于RLoMAG+EAS的同构集群装备体系作战效能评估方法[J]. 系统仿真学报, 2024, 36(1): 160-169.
[8]	安靖, 司光亚, 张雷. 基于深度强化学习的立体投送策略优化方法研究[J]. 系统仿真学报, 2024, 36(1): 39-49.
[9]	郭润夏, 王一府. 以维修间隔利用率最优为目标的飞机派遣方法[J]. 系统仿真学报, 2023, 35(9): 1985-1999.
[10]	林俊强, 王红军, 邹湘军, 张坡, 李承恩, 周益鹏, 姚书杰. 基于DPPO的移动采摘机器人避障路径规划及仿真[J]. 系统仿真学报, 2023, 35(8): 1692-1704.
[11]	刘家义, 王刚, 付强, 郭相科, 王思远. 基于分配策略优化算法的智能防空任务分配[J]. 系统仿真学报, 2023, 35(8): 1705-1716.
[12]	杨来义, 毕敬, 苑海涛. 基于SAC算法的移动机器人智能路径规划[J]. 系统仿真学报, 2023, 35(8): 1726-1736.
[13]	马苗苗, 董利鹏, 刘向杰. 基于Q-learning算法的多智能体微电网能量管理策略[J]. 系统仿真学报, 2023, 35(7): 1487-1496.
[14]	李成兵, 李云飞, 吴鹏. 基于多智能体的城市群客运网络脆弱性动态仿真[J]. 系统仿真学报, 2023, 35(6): 1183-1190.
[15]	丁飞, 沙宇晨, 洪莹, 蒯晓, 张登银. 智能网联汽车计算卸载与边缘缓存联合优化策略[J]. 系统仿真学报, 2023, 35(6): 1203-1214.