基于强化学习的连续型机械臂自适应跟踪控制

doi:10.16182/j.issn1004731x.joss.21-0632

系统仿真学报 ›› 2022, Vol. 34 ›› Issue (10): 2264-2271.doi: 10.16182/j.issn1004731x.joss.21-0632

• 仿真支撑平台/系统技术 • 上一篇下一篇

基于强化学习的连续型机械臂自适应跟踪控制

江达¹(), 蔡志勤¹(), 刘忠振¹, 彭海军¹^,², 吴志刚²

^1.大连理工大学，辽宁大连 116024
^2.工业装备结构分析国家重点实验室，辽宁大连 116024

收稿日期:2021-07-07 修回日期:2021-09-12 出版日期:2022-10-30 发布日期:2022-10-18
通讯作者: 蔡志勤 E-mail:ziangdar@sina.com;zhqcai@dlut.edu.cn
第一作者简介:江达(1992-)，男，博士生，研究方向为空间机器人动力学与控制。E-mail：ziangdar@sina.com
基金资助:
国家自然科学基金重大研究计划重点项目(91748203);国家自然科学基金优秀青年项目(11922203)

Reinforcement-learning-based Adaptive Tracking Control for a Space Continuum Robot Based on Reinforcement Learning

Da Jiang¹(), Zhiqin Cai¹(), Zhongzhen Liu¹, Haijun Peng¹^,², Zhigang Wu²

^1.Dalian University of Technology, Dalian 116024, China
^2.State Key Laboratory of Structural Analysis for Industrial Equipment, Dalian 116024, China

Received:2021-07-07 Revised:2021-09-12 Online:2022-10-30 Published:2022-10-18
Contact: Zhiqin Cai E-mail:ziangdar@sina.com;zhqcai@dlut.edu.cn

摘要/Abstract

摘要：

针对空间主动碎片清除操作中连续型三臂节机器人系统跟踪问题，提出一种基于强化学习的自适应滑模控制算法。基于数据驱动的建模方法，采用BP神经网络对三臂节连续型机械臂进行建模，并作为预测模型指导强化学习实时调节所提出滑模控制器的控制参数，从而实现连续型机器人运动的实时跟踪控制。仿真结果表明：提出的数据驱动的预测模型对随机轨迹预测的相对误差保持在 $± 1 %$ 以内，能够高精度地反映系统动态特性。对比固定参数的滑模控制器，提出的自适应控制器在保证系统达到控制目标的同时具有更低的超调量和更短的调节时间，表现出更好的控制效果。

关键词: 空间连续型机器人, 强化学习, 预测控制, 滑模控制, 轨迹跟踪

Abstract:

Aiming at the tracking control for three-arm space continuum robot in space active debris removal manipulation, an adaptive sliding mode control algorithm based on deep reinforcement learning is proposed. Through BP network, a data-driven dynamic model is developed as the predictive model to guide the reinforcement learning to adjust the sliding mode controller's parameters online, and finally realize a real-time tracking control. Simulation results show that the proposed data-driven predictive model can accurately predict the robot's dynamic characteristics with the relative error within $± 1 %$ to random trajectories. Compared with the fixed-parameter sliding mode controller, the proposed adaptive controller has a lower overshoot and shorter settling time and can achieve a better tracking performance.

Key words: space continuum robot, reinforcement learning, predictive control, sliding mode control, trajectory tracking

中图分类号:

TP273.2

江达,蔡志勤,刘忠振等 . 基于强化学习的连续型机械臂自适应跟踪控制[J]. 系统仿真学报, 2022, 34(10): 2264-2271.

Da Jiang,Zhiqin Cai,Zhongzhen Liu,et al . Reinforcement-learning-based Adaptive Tracking Control for a Space Continuum Robot Based on Reinforcement Learning[J]. Journal of System Simulation, 2022, 34(10): 2264-2271.

图/表 8

图1

图2

图3

图4

图5

图6

图7

图8

参考文献 15

1	Grassmann R, Modes V, Burgner-Kahrs J. Learning the Forward and Inverse Kinematics of a 6-DOF Concentric Tube Continuum Robot in SE(3)[C]//2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). New York: IEEE, 2018: 5125-5132.
2	Lai J, Huang K, Chu H K. A Learning-based Inverse Kinematics Solver for a Multi-Segment Continuum Robot in Robot-Independent Mapping[C]//2019 IEEE International Conference on Robotics and Biomimetics (ROBIO). New York: IEEE, 2019: 576-582.
3	Thuruthel T G, Falotico E, Manti M, et al. Stable Open-Loop Control of Soft Robotic Manipulators[J]. IEEE Robotics and Automation Letters(S2377-3766), 2018, 3(2): 1292-1298.
4	Thuruthel T G, Falotico E, Renda F, et al. Learning Dynamic Models for Open Loop Predictive Control of Soft Robotic Manipulators[J]. Bioinspiration & Biomimetics (S1748-3182), 2017, 12(6): 066003.
5	Li L, Miao Y, Qureshi A H, et al. MPC-MPNet: Model-Predictive Motion Planning Networks for Fast, Near-Optimal Planning Under Kinodynamic Constraints [J]. IEEE Robotics and Automation Letters (S2377-3766), 2021, 6(3): 4496-4503.
6	Ouyang B, Mo H, Chen H, et al. Robust Model Predictive Deformation Control of a Soft Object by Using a Flexible Continuum Robot[C]//2018 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS). New York: IEEE, 2018: 613-618.
7	Tang Z Q, Heung H L, Tong K Y, et al. A Novel Iterative Learning Model Predictive Control Method for Soft Bending Actuators [C]//2019 International Conference on Robotics and Automation (ICRA). New York: IEEE, 2019: 4004-4010.
8	Frazelle C, Rogers J, Karamouzas I, et al. Optimizing a Continuum Manipulator's Search Policy Through Model-Free Reinforcement Learning [C]//IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). New York: IEEE, 2020: 5564-5571.
9	Shin C, Ferguson P W, Pedram S A, et al. Autonomous Tissue Manipulation via Surgical Robot Using Learning Based Model Predictive Control [C]//2019 International Conference on Robotics and Automation (ICRA). New York: IEEE, 2019: 3875-3881.
10	Thuruthel T G, Falotico E, Renda F, et al. Model-Based Reinforcement Learning for Closed-Loop Dynamic Control of Soft Robotic Manipulators [J]. IEEE Transactions on Robotics (S1552-3098), 2019, 35(1): 124-134.
11	邱小璐, 蔡志勤, 刘忠振, 等. 空间连续型机器人自适应鲁棒容错控制[J]. 计算力学学报, 2021, 38(1): 46-50.
	Qiu Xiaolu, Cai Zhiqin, Liu Zhongzhen, et al. Adaptive Robust Fault Tolerant Control of A Space Continuum Robot[J]. Chinese Journal of Computational Mechanics, 2021, 38(1): 46-50.
12	Fujimoto S, Van H, Meger D. Addressing Function Approximation Error in Actor-Critic Methods [C]//2018 International Conference on Machine Learning(ICML). New York: PMLR, 2018: 1587-1596.
13	Gu S, Holly E, Lillicrap T, et al. Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-policy Updates [C]// 2017 IEEE International Conference on Robotics and Automation (ICRA). New York: IEEE, 2017: 3389-3396.
14	Braganza D, Dawson D M, Walker I D, et al. A Neural Network Controller for Continuum Robots [J]. IEEE Transactions on Robotics (S1552-3098), 2007, 23(6): 1270-1277.
15	Wang J, Zhou Y, Bao Y, et al. Trajectory Tracking Control Using Fractional-Order Terminal Sliding Mode Control With Sliding Perturbation Observer for a 7-DOF Robot Manipulator [J]. IEEE/ASME Transactions on Mechatronics (S1083-4435), 2020, 25(4): 1886-1893.

[1]	周子聪, 曾俊杰, 胡越, 朱正秋, 尹全军. 基于次优示例引导的兵棋推演多智能体强化学习方法[J]. 系统仿真学报, 2026, 38(5): 1277-1289.
[2]	陈鑫杭, 凌晓冬, 郎程畅, 郑师军, 汤伊奇. 基于改进趋近律的共轴双旋翼无人机控制研究[J]. 系统仿真学报, 2026, 38(4): 1119-1128.
[3]	李国政, 王锐, 范士超, 蔡欣彤, 翟心悦. 航天器轨道规避仿真建模策略优化研究综述[J]. 系统仿真学报, 2026, 38(4): 855-868.
[4]	秦龙, 黄鹤松, 尹路珈, 艾川, 张琪, 李新梦. 云原生仿真驱动的智能竞赛平台与模式[J]. 系统仿真学报, 2026, 38(4): 988-1003.
[5]	李德权, 熊婉. 基于SAC3Q-HDM的强化学习机器人路径规划[J]. 系统仿真学报, 2026, 38(3): 714-724.
[6]	吴舒霞, 张俊杰, 陈德珑, 陈哲毅. 面向边缘实时视频分析的资源高效持续学习框架[J]. 系统仿真学报, 2026, 38(2): 294-306.
[7]	章子瑶, 季云峰. 基于Transformer课程RL的机械臂接球策略仿真研究[J]. 系统仿真学报, 2026, 38(2): 321-331.
[8]	杨灿, 陈凯, 朱峰. 多约束条件下基于强化学习的无人机团队定向优化方法[J]. 系统仿真学报, 2026, 38(2): 360-371.
[9]	王秉坤, 王越, 杨妹, 张鹏年, 樊浡昊, 唐杰. 基于改进近端策略优化算法的无人车打击策略规划方法[J]. 系统仿真学报, 2026, 38(2): 372-386.
[10]	郑巍, 汤佳豪, 熊小平, 樊鑫. 基于非对称自博弈的非均势空战智能决策方法[J]. 系统仿真学报, 2026, 38(2): 433-446.
[11]	丁拯坤, 刘佳奇, 徐军政, 徐悦竹, 王兴梅. 基于BiGRU与优先级动态采样的智能空战决策方法[J]. 系统仿真学报, 2026, 38(2): 447-459.
[12]	陶彩霞, 陈乃焜, 高锋阳, 张建刚. 基于多智能体强化学习的综合能源分布式优化[J]. 系统仿真学报, 2026, 38(2): 476-487.
[13]	胥日升, 杨林瑶, 覃缘琪, 王晓, 孙长银. 知识增强大语言模型的区域交通信号控制方法[J]. 系统仿真学报, 2026, 38(2): 518-531.
[14]	江明, 何韬. 基于深度强化学习的带容量约束车辆路径问题求解[J]. 系统仿真学报, 2025, 37(9): 2177-2187.
[15]	倪培龙, 毛鹏军, 王宁, 杨孟杰. 基于改进A-DDQN算法的机器人路径规划[J]. 系统仿真学报, 2025, 37(9): 2420-2430.

基于强化学习的连续型机械臂自适应跟踪控制

Reinforcement-learning-based Adaptive Tracking Control for a Space Continuum Robot Based on Reinforcement Learning

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 8

参考文献 15

相关文章 15

编辑推荐

Metrics

本文评价