基于深度强化学习的机械臂控制方法

doi:10.16182/j.issn1004731x.joss.19-FZ0378

系统仿真学报 ›› 2019, Vol. 31 ›› Issue (11): 2452-2457.doi: 10.16182/j.issn1004731x.joss.19-FZ0378

基于深度强化学习的机械臂控制方法

李鹤宇¹, 赵志龙^1,2,3, 顾蕾¹, 郭丽琴^1,2,3, 曾贲¹, 林廷宇^1,2,3

1. 北京市复杂产品先进制造系统工程技术研究中心北京仿真中心,北京 100854;
2. 复杂产品智能制造系统技术国家重点实验室北京电子工程总体研究所,北京 100854;
3. 航天系统仿真重点实验室北京仿真中心,北京 100854

收稿日期:2019-05-21 修回日期:2019-07-25 出版日期:2019-11-10 发布日期:2019-12-13
作者简介:李鹤宇(1993-),男,河北石家庄,硕士,研究方向为深度强化学习,建模仿真技术;赵志龙(1987-),男,河北廊坊,硕士,助工,研究方向为虚拟样机,智能制造等。
基金资助:
国家重点研发计划(2018YFB1004005)

Robot Arm Control Method Based on Deep Reinforcement Learning

Li Heyu¹, Zhao Zhilong^1,2,3, Gu Lei¹, Guo Liqin^1,2,3, Zeng Bi¹, Lin Tingyu^1,2,3

1. Beijing Complex Product Advanced Manufacturing Engineering Research Center, Beijing Simulation Center, Beijing 100854, China;
2. State Key Laboratory of Intelligent Manufacturing System Technology, Beijing Institute of Electronic System Engineering, Beijing 100854, China;
3. Science and Technology on Space System Simulation Laboratory, Beijing Simulation Center, Beijing 100854, China

Received:2019-05-21 Revised:2019-07-25 Online:2019-11-10 Published:2019-12-13

摘要/Abstract

摘要： 深度强化学习在环境中不断探索尝试,通过奖励函数对神经网络参数进行调节。实际的生产线无法作为算法的试错环境,不能提供足够的数据,构建一个机械臂仿真环境,包括机械臂与物体两部分,根据目标设置状态变量与奖励机制,在模型中对深度确定性策略梯度算法(Deep Deterministic Policy Gradient, DDPG)进行训练,实现通过深度强化学习算法控制机械臂,将抓手移动至物体下方,改进控制算法的适应性,缩短调试时间。实验结果表明,深度学习算法能够在更短的时间内达到收敛,实现对机械臂的控制。

关键词: 系统仿真, Unity, 强化学习, 神经网络

Abstract: Deep reinforcement learning continues to explore in the environment and adjusts the neural network parameters by the reward function. The actual production line can not be used as the trial and error environment for the algorithm, so there is not enough data. For that, this paper constructs a virtual robot arm simulation environment, including the robot arm and the object. The Deep Deterministic Policy Gradient (DDPG),in which the state variables and reward function are set,is trained by deep reinforcement learning algorithm in the simulation environment to realize the target of controlling the robot arm to move the gripper below the object. The new method using neural network can improve the adaptability of the control algorithm and shorten the debugging time. The simulation results show that in the environment constructed in this paper, the deep learning algorithm can converge in a shorter time and control the robot arm to achieve specific goals.

Key words: system simulation, unity, reinforcement learning, neural network

中图分类号:

TP391

李鹤宇, 赵志龙, 顾蕾, 郭丽琴, 曾贲, 林廷宇. 基于深度强化学习的机械臂控制方法[J]. 系统仿真学报, 2019, 31(11): 2452-2457.

Li Heyu, Zhao Zhilong, Gu Lei, Guo Liqin, Zeng Bi, Lin Tingyu. Robot Arm Control Method Based on Deep Reinforcement Learning[J]. Journal of System Simulation, 2019, 31(11): 2452-2457.

参考文献

[1] Wopereis H W, Hoekstra J J, Post T H, et al.Application of substantial and sustained force to vertical surfaces using a quadrotor[C]. 2017 IEEE international conference on robotics and automation (ICRA). Macau: IEEE, 2017: 2704-2709.
[2] 李慧洁, 蔡远利. 基于双幂次趋近律的滑模控制方法[J]. 控制与决策, 2016, 31(3): 498-502.
Li Huijie, Cai Yuanli.Sliding mode control with double power reaching law[J]. Control and Decision, 2016, 31(3): 498-502.
[3] Soltanpour M R, Khooban M H.A particle swarm optimization approach for fuzzy sliding mode control for tracking the robot manipulator[J]. Nonlinear Dynamics (S0924-090X), 2013, 74(1/2): 467-478.
[4] Wang Z, Liu X, Liu K, et al.Backstepping-based Lyapunov function construction using approximate dynamic programming and sum of square techniques[J]. IEEE Transactions on Cybernetics (S1083-4419), 2016, 47(10): 3393-3403.
[5] Yin X G, Wang H P, Wu G.Path planning algorithm for bending robots[C]. 2009 IEEE International Conference on Robotics and Biomimetics (ROBIO). Guilin: IEEE, 2009: 392-395.
[6] Cho H C, Song J B.Null space motion control of a redundant robot arm using matrix augmentation and saturation method[C]. 12^th International Conference on Motion and Vibration Control, MOVIC 2014. Sapporo: Japan Society of Mechanical Engineers, 2014.
[7] Li Y M, Tong S C.Adaptive fuzzy output-feedback stabilization control for a class of switched nonstrict-feedback nonlinear systems[J]. IEEE Transactions on Cybernetics (S1083-4419), 2016, 47(4): 1007-1016.
[8] Li X J, Yang G H.Adaptive decentralized control for a class of interconnected nonlinear systems via backstepping approach and graph theory[J]. Automatica (S0005-1098), 2017, 76: 87-95.
[9] Wang H, Wang Z, Liu Y J, et al.Fuzzy tracking adaptive control of discrete-time switched nonlinear systems[J]. Fuzzy Sets and Systems (S0165-0114), 2017, 316: 35-48.
[10] Mnih V, Kavukcuoglu K, Silver D, et al. Playing atari with deep reinforcement learning[J]. arXiv preprint arXiv:1312.5602, 2013.
[11] Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning[J]. arXiv preprint arXiv:1509.02971, 2015.
[12] Schulman J, Levine S, Abbeel P, et al.Trust region policy optimization[C]. International Conference on Machine Learning. 2015: 1889-1897.
[13] Mnih V, Badia A P, Mirza M, et al.Asynchronous methods for deep reinforcement learning[C]. International conference on machine learning. New York: dblp, 2016: 1928-1937.
[14] Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms[J]. arXiv preprint arXiv:1707.06347, 2017.
[15] Heess N, Sriram S, Lemmon J, et al. Emergence of locomotion behaviours in rich environments[J]. arXiv preprint arXiv:1707.02286, 2017.
[16] Buchli J, Stulp F, Theodorou E, et al.Learning variable impedance control[J]. The International Journal of Robotics Research (S0278-3649), 2011, 30(7): 820-833.
[17] Stulp F, Sigaud O.Robot skill learning: From reinforcement learning to evolution strategies[J]. Paladyn, Journal of Behavioral Robotics (S2081-4836), 2013, 4(1): 49-61.
[18] Wang J.Analysis and design of a k-winners-take-all model with a single state variable and the heaviside step activation function[J]. IEEE Transactions on Neural Networks (S2162-237X), 2010, 21(9): 1496-1506.
[19] Liu Q, Wang J.Finite-Time Convergent Recurrent Neural Network With a Hard-Limiting Activation Function for Constrained Optimization With Piecewise-Linear Objective Functions[J]. IEEE Transactions on Neural Networks (S2162-237X), 2011, 22(4): 601-613.
[20] Kormushev P, Calinon S, Caldwell D G.Imitation Learning of Positional and Force Skills Demonstrated via Kinesthetic Teaching and Haptic Input[J]. Advanced Robotics (S0169-1864), 2011, 25(5): 581-603.
[21] 陈友东, 郭佳鑫, 陶永. 基于高斯过程的机器人自适应抓取策略[J]. 北京航空航天大学学报, 2017, 43(9): 1738-1745.
Chen Youdong, Guo Jiaxin, Tao Yong.Adaptive Grasping Strategy of Robot Based on Gaussian Process[J]. Journal of Beijing University of Aeronautics and Astronautics, 2017, 43(9): 1738-1745.
[22] Ngo T Q, Wang Y N, Mai T L, et al.Robust adaptive neural-fuzzy network tracking control for robot manipulator[J]. International Journal of Computers Communications & Control (S1841-9836), 2012, 7(2): 341-352.
[23] Lee C H, Wang W C.Robust adaptive position and force controller design of robot manipulator using fuzzy neural networks[J]. Nonlinear Dynamics (S0924-090X), 2016, 85(1): 343-354.

基于深度强化学习的机械臂控制方法

Robot Arm Control Method Based on Deep Reinforcement Learning

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	赵也践, 王艳红, 张俊, 于洪霞, 田中大. 改进Q学习算法在作业车间调度问题中的应用[J]. 系统仿真学报, 2022, 34(6): 1247-1258.
[2]	张森, 张孟炎, 邵敬平, 普杰信. 基于随机策略搜索的多机三维路径规划方法[J]. 系统仿真学报, 2022, 34(6): 1286-1295.
[3]	陆承, 靳学胜. 基于Steam VR的交互仿真水枪灭火训练系统设计[J]. 系统仿真学报, 2022, 34(6): 1312-1319.
[4]	倪凌佳, 黄晓霞, 李红旮, 张子博. 基于协作式深度强化学习的火灾应急疏散仿真研究[J]. 系统仿真学报, 2022, 34(6): 1353-1366.
[5]	张立峰, 王会忍. 基于卷积神经网络及有限元仿真的电容层析成像图像重建[J]. 系统仿真学报, 2022, 34(4): 712-718.
[6]	康旭, 张晓峰. 基于生成对抗神经网络的雷达遥感数据增广方法[J]. 系统仿真学报, 2022, 34(4): 920-927.
[7]	周思锦, 陈棣成, 涂耿, 姜大志. 基于个性化和记忆机制的多模态情感计算模型[J]. 系统仿真学报, 2022, 34(4): 745-758.
[8]	王爽, 高朝晖, 陈思宇, 汤孝, 郗展. 基于Simulink的同步发电机仿真代数环问题研究[J]. 系统仿真学报, 2022, 34(3): 482-489.
[9]	孙晓安, 栾小丽, 刘飞. 基于智能优化灰色模型的电子固废预测[J]. 系统仿真学报, 2022, 34(3): 536-542.
[10]	王红微, 杨鹏. 基于深度强化学习的机场货运业务优化研究[J]. 系统仿真学报, 2022, 34(3): 651-660.
[11]	王霄汉, 张霖, 赖李媛君, 谢堃钰, 胡听春. 基于DEVS原子模型的智能体离散仿真构建方法[J]. 系统仿真学报, 2022, 34(2): 191-200.
[12]	李启锐, 彭心怡. 基于深度强化学习的云作业调度及仿真研究[J]. 系统仿真学报, 2022, 34(2): 258-268.
[13]	魏娟, 游磊, 郭阳勇, 唐志海. 基于小波神经网络的多楼层疏散模型[J]. 系统仿真学报, 2022, 34(2): 269-277.
[14]	敖邦乾, 杨莎, 令狐金卿, 叶振环. 基于级联神经网络疲劳驾驶检测系统设计[J]. 系统仿真学报, 2022, 34(2): 323-333.
[15]	吴曦, 孟祥林, 杨镜宇. 下一代战略博弈推演系统研究[J]. 系统仿真学报, 2021, 33(9): 2017-2024.