基于强化学习的桥式起重机输出反馈控制研究

doi:10.16182/j.issn1004731x.joss.25-0625

摘要/Abstract

摘要：

针对桥式起重机系统的最优控制问题，设计了基于强化学习的输出反馈控制算法。利用系统输出数据设计高增益观测器估计桥式起重机系统的不可测状态。根据高增益观测器的估计状态，设计基于积分强化学习的策略迭代方法，分别采用Critic和Actor神经网络来逼近最优值函数和控制策略，并通过在线自适应算法实时调整神经网络权值。根据Lyapunov稳定性理论，证明了系统状态、状态观测误差，以及神经网络权值估计误差一致最终有界，从而保证闭环系统的稳定性，并得到系统次最优控制策略。仿真结果表明：该控制算法在系统状态不完全可测的情况下仍能实现小车准确定位以及载荷较小摆动。

关键词: 桥式起重机, 强化学习, 高增益观测器, 输出反馈, 策略迭代

Abstract:

An output feedback control algorithm is designed based on reinforcement learning for the optimal control problem of overhead crane system. A high gain observer (HGO) is designed using output data to estimate the unmeasurable states of the overhead crane system. Based on the estimated states from the high-gain observer, a policy iteration (PI) method is designed with integral reinforcement learning, which uses Critic and Actor neural networks to approximate the optimal value function and control strategy, and adjusts the neural network weights in real time through online adaptive algorithms. According to the Lyapunov stability theory, the uniform ultimate boundedness of the system state, state observation error, and neural network weight estimation error is demonstrated, thereby ensuring the stability of the closed-loop system and obtaining a suboptimal control policy. The simulation results demonstrate that the proposed control algorithm achieves accurate trolley positioning and minor payload swings despite incomplete system state measurement.

Key words: overhead crane, reinforcement learning, high gain observer, output feedback, PI

中图分类号:

TP391.9

李明辉,高道祥 . 基于强化学习的桥式起重机输出反馈控制研究[J]. 系统仿真学报, 2026, 38(6): 1771-1781.

Li Minghui,Gao Daoxiang . Research on Output Feedback Control Based on Reinforcement Learning for Overhead Crane[J]. Journal of System Simulation, 2026, 38(6): 1771-1781.

图/表 6

图1

图2

图3

图4

图5

图6

参考文献 33

[1]	邵雪卷, 李瑶, 张井岗, 等. 桥式起重机轨迹规划的方法研究[J]. 系统仿真学报, 2019, 31(5): 971-977.
	Shao Xuejuan, Li Yao, Zhang Jinggang, et al. Trajectory Planning Method of Overhead Crane[J]. Journal of System Simulation, 2019, 31(5): 971-977.
[2]	Liu Diantong, Yi Jianqiang, Zhao Dongbin, et al. Adaptive Sliding Mode Fuzzy Control for a Two-dimensional Overhead Crane[J]. Mechatronics, 2005, 15(5): 505-522.
[3]	Almutairi Naif B, Zribi Mohamed. Sliding Mode Control of a Three-dimensional Overhead Crane[J]. Journal of Vibration and Control, 2009, 15(11): 1679-1730.
[4]	Yoshida Yasuo. Feedback Control and Time-optimal Control About Overhead Crane by Visual Servo and These Combination Control[M]. London: IntechOpen, 2011.
[5]	Paiewonsky B. Optimal Control: A Review of Theory and Practice[J]. AIAA Journal, 1965, 3(11): 1985-2006.
[6]	Trélat E. Optimal Control and Applications to Aerospace: Some Results and Challenges[J]. Journal of Optimization Theory and Applications, 2012, 154(3): 713-758.
[7]	Dempster Rowan, Al-Sharman Mohammad, Rayside Derek, et al. Real-time Unified Trajectory Planning and Optimal Control for Urban Autonomous Driving Under Static and Dynamic Obstacle Constraints[C]//2023 IEEE International Conference on Robotics and Automation (ICRA). Piscataway: IEEE, 2023: 10139-10145.
[8]	Li Yongqiang, Hou Zhongsheng, Feng Yuanjing, et al. Data-driven Approximate Value Iteration with Optimality Error Bound Analysis[J]. Automatica, 2017, 78: 79-87.
[9]	Guo Lei, Zhao Han. Online Adaptive Optimal Control Algorithm Based on Synchronous Integral Reinforcement Learning with Explorations[J]. Neurocomputing, 2023, 520: 250-261.
[10]	Song Ruizhuo, Zhu Liao. Stable Value Iteration for Two-player Zero-sum Game of Discrete-time Nonlinear Systems Based on Adaptive Dynamic Programming[J]. Neurocomputing, 2019, 340: 180-195.
[11]	Su Hanguang, Zhang Huaguang, Zhang Kun, et al. Online Reinforcement Learning for a Class of Partially Unknown Continuous-time Nonlinear Systems via Value Iteration[J]. Optimal Control Applications & Methods, 2018, 39(2): 1011-1028.
[12]	Wang Chenglong, Fang Haiyang, He Shuping. Adaptive Optimal Controller Design for a Class of LDI-based Neural Network Systems with Input Time-delays[J]. Neurocomputing, 2020, 385: 292-299.
[13]	Bertsekas D P. Value and Policy Iterations in Optimal Control and Adaptive Dynamic Programming[J]. IEEE Transactions on Neural Networks and Learning Systems, 2017, 28(3): 500-509.
[14]	Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous Control with Deep Reinforcement Learning[EB/OL]. (2019-07-05) [2025-06-01]. .
[15]	Langford J. Approximately Optimal Approximate Reinforcement Learning[EB/OL]. [2025-06-15]. .
[16]	Vrabie D, Pastravanu O, Abu-Khalaf M, et al. Adaptive Optimal Control for Continuous-time Linear Systems Based on Policy Iteration[J]. Automatica, 2009, 45(2): 477-484.
[17]	Jiang Yu, Jiang Zhongping. Computational Adaptive Optimal Control for Continuous-time Linear Systems with Completely Unknown Dynamics[J]. Automatica, 2012, 48(10): 2699-2704.
[18]	Bian Tao, Jiang Zhongping. Value Iteration and Adaptive Dynamic Programming for Data-driven Adaptive Optimal Control Design[J]. Automatica, 2016, 71: 348-360.
[19]	Vamvoudakis K G, Lewis F L. Online Actor-Critic Algorithm to Solve the Continuous-time Infinite Horizon Optimal Control Problem[J]. Automatica, 2010, 46(5): 878-888.
[20]	Modares H, Lewis F L. Optimal Tracking Control of Nonlinear Partially-unknown Constrained-input Systems Using Integral Reinforcement Learning[J]. Automatica, 2014, 50(7): 1780-1792.
[21]	Rizvi S A A, Lin Zongli. Output Feedback Adaptive Dynamic Programming for Linear Differential Zero-sum Games[J]. Automatica, 2020, 122: 109272.
[22]	Xie Kedi, Zheng Yiwei, Jiang Yi, et al. Optimal Dynamic Output Feedback Control of Unknown Linear Continuous-time Systems by Adaptive Dynamic Programming[J]. Automatica, 2024, 163: 111601.
[23]	Wang Ding, He Haibo, Liu Derong. Intelligent Optimal Control with Critic Learning for a Nonlinear Overhead Crane System[J]. IEEE Transactions on Industrial Informatics, 2018, 14(7): 2932-2940.
[24]	Zhang Haoran, Zhao Chunhui, Ding Jinliang. Online Reinforcement Learning with Passivity-based Stabilizing Term for Real Time Overhead Crane Control Without Knowledge of the System Model[J]. Control Engineering Practice, 2022, 127: 105302.
[25]	Sun Ning, Fang Yongchun. New Energy Analytical Results for the Regulation of Underactuated Overhead Cranes: An End-effector Motion-based Approach[J]. IEEE Transactions on Industrial Electronics, 2012, 59(12): 4723-4734.
[26]	Atassi A N, Khalil H K. A Separation Principle for the Stabilization of a Class of Nonlinear Systems[J]. IEEE Transactions on Automatic Control, 1999, 44(9): 1672-1687.
[27]	Vamvoudakis K G, Vrabie D, Lewis F L. Online Adaptive Algorithm for Optimal Control with Integral Reinforcement Learning[J]. International Journal of Robust and Nonlinear Control, 2014, 24(17): 2686-2710.
[28]	Williams Jesús López Yánez, Francisco das Chagas de Souza. On the Effect of Probing Noise in Optimal Control LQR via Q-learning Using Adaptive Filtering Algorithms[J]. European Journal of Control, 2022, 65: 100633.
[29]	Ge S S, Lee T H, Harris C J. Adaptive Neural Network Control of Robotic Manipulators[M]//World Scientific Series in Robotics and Intelligent Systems. Singapore: World Scientific Publishing Co Pte Ltd, 1998: 396.
[30]	Ioannou Petros, Fidan Bariş. Adaptive Control Tutorial[M]. Philadelphia: SIAM, 2006.
[31]	Vrabie D, Lewis F. Neural Network Approach to Continuous-time Direct Adaptive Optimal Control for Partially Unknown Nonlinear Systems[J]. Neural Networks, 2009, 22(3): 237-246.
[32]	Wang Ding, Mu Chaoxu. A Novel Neural Optimal Control Framework with Nonlinear Dynamics: Closed-loop Stability and Simulation Verification[J]. Neurocomputing, 2017, 266: 353-360.
[33]	Khalil. High-gain Observers in Nonlinear Feedback Control[M]. London: Springer, 1999.