系统仿真学报 ›› 2026, Vol. 38 ›› Issue (6): 1771-1781.doi: 10.16182/j.issn1004731x.joss.25-0625

• 论文 • 上一篇    

基于强化学习的桥式起重机输出反馈控制研究

李明辉, 高道祥   

  1. 北京林业大学 工学院,北京 100083
  • 收稿日期:2025-07-01 修回日期:2025-08-26 出版日期:2026-06-25 发布日期:2026-06-25
  • 通讯作者: 高道祥
  • 第一作者简介:李明辉(2001-),男,硕士生,研究方向为强化学习与最优控制。

Research on Output Feedback Control Based on Reinforcement Learning for Overhead Crane

Li Minghui, Gao Daoxiang   

  1. School of Technology, Beijing Forestry University, Beijing 100083, China
  • Received:2025-07-01 Revised:2025-08-26 Online:2026-06-25 Published:2026-06-25
  • Contact: Gao Daoxiang

摘要:

针对桥式起重机系统的最优控制问题,设计了基于强化学习的输出反馈控制算法。利用系统输出数据设计高增益观测器估计桥式起重机系统的不可测状态。根据高增益观测器的估计状态,设计基于积分强化学习的策略迭代方法,分别采用Critic和Actor神经网络来逼近最优值函数和控制策略,并通过在线自适应算法实时调整神经网络权值。根据Lyapunov稳定性理论,证明了系统状态、状态观测误差,以及神经网络权值估计误差一致最终有界,从而保证闭环系统的稳定性,并得到系统次最优控制策略。仿真结果表明:该控制算法在系统状态不完全可测的情况下仍能实现小车准确定位以及载荷较小摆动。

关键词: 桥式起重机, 强化学习, 高增益观测器, 输出反馈, 策略迭代

Abstract:

An output feedback control algorithm is designed based on reinforcement learning for the optimal control problem of overhead crane system. A high gain observer (HGO) is designed using output data to estimate the unmeasurable states of the overhead crane system. Based on the estimated states from the high-gain observer, a policy iteration (PI) method is designed with integral reinforcement learning, which uses Critic and Actor neural networks to approximate the optimal value function and control strategy, and adjusts the neural network weights in real time through online adaptive algorithms. According to the Lyapunov stability theory, the uniform ultimate boundedness of the system state, state observation error, and neural network weight estimation error is demonstrated, thereby ensuring the stability of the closed-loop system and obtaining a suboptimal control policy. The simulation results demonstrate that the proposed control algorithm achieves accurate trolley positioning and minor payload swings despite incomplete system state measurement.

Key words: overhead crane, reinforcement learning, high gain observer, output feedback, PI

中图分类号: