系统仿真学报 ›› 2024, Vol. 36 ›› Issue (6): 1425-1432.doi: 10.16182/j.issn1004731x.joss.23-0137

• 论文 • 上一篇    下一篇

基于PPO的自适应PID控制算法研究

周志勇1(), 莫非1, 赵凯2, 郝云波2, 钱宇峰1   

  1. 1.上海电机学院 机械学院,上海 201306
    2.上海航天设备制造总厂有限公司,上海 200245
  • 收稿日期:2023-02-14 修回日期:2023-04-21 出版日期:2024-06-28 发布日期:2024-06-19
  • 第一作者简介:周志勇(1984-),男,副教授,博士,研究方向为创新设计理论与方法。E-mail:zhouzhiyong789@tom.com
  • 基金资助:
    上海市闵行区重大产业技术攻关计划(2022MH-ZD20)

Adaptive PID Control Algorithm Based on PPO

Zhou Zhiyong1(), Mo Fei1, Zhao Kai2, Hao Yunbo2, Qian Yufeng1   

  1. 1.School of Mechanical Engineering, Shanghai Dianji University, Shanghai 201306, China
    2.Shanghai Aerospace Equipment Manufacturing General Factory, Shanghai 200245, China
  • Received:2023-02-14 Revised:2023-04-21 Online:2024-06-28 Published:2024-06-19

摘要:

采用MATLAB物理引擎联合Python搭建了一个六轴机械臂,并模拟带有扰动的复杂控制环境,为机械臂训练提供现实中无法提供的试错环境。使用强化学习中近端优化算法(proximal policy optimization,PPO)算法对传统PID控制算法进行改进,引入多智能体思想,根据PID三个参数对控制系统的不同影响及六轴机械臂的特性,将三个参数分别作为不同的智能个体进行训练,实现多智能体自适应调整参数的新型多智能体自适应PID算法。仿真结果表明:该算法的训练收敛性优于MA-DDPG与MA-SAC算法,与传统PID算法的控制效果相比,在遇到扰动及振荡的情况下,能够更有效地抑制振荡,并具有更低的超调量和调整时间,控制过程更为平缓,有效提高了机械臂的控制精度,证明了该算法的鲁棒性及有效性。

关键词: 强化学习, 近端优化算法, 自适应PID整定, 机械臂, 多智能体

Abstract:

A six-axis robotic arm is built and simulated in a complex control environment with disturbances by using MATLAB physics engine and Python, which provides a trial-and-error environment for the robotic arm training that could not be provided in reality. Proximal policy optimization(PPO) algorithm in reinforcement learning is proposed to improve the traditional PID control algorithm. By introducing the multi-agent idea and on the basis of the different effects of the three parameters of PID on control system and the characteristics of the six-axis robotic arm, the three parameters are separately trained as different intelligent individuals to achieve a new multi-agent adaptive PID algorithm with multi-agent adaptive adjustment of parameters. Simulation results show that the algorithm outperforms MA-DDPG and MA-SAC algorithms in training convergence. Compared with the traditional PID algorithm, the algorithm can effectively suppress the disturbances and oscillations, and has lower overshoot and adjustment time, which makes the control process smoother and effectively improves the control accuracy of the robotic arm. The robustness and effectiveness is proved.

Key words: RL, PPO algorithm, adaptive PID tuning, robotic arm, multi-agent

中图分类号: