系统仿真学报 ›› 2019, Vol. 31 ›› Issue (11): 2452-2457.doi: 10.16182/j.issn1004731x.joss.19-FZ0378

• 仿真应用工程 • 上一篇    下一篇

基于深度强化学习的机械臂控制方法

李鹤宇1, 赵志龙1,2,3, 顾蕾1, 郭丽琴1,2,3, 曾贲1, 林廷宇1,2,3   

  1. 1. 北京市复杂产品先进制造系统工程技术研究中心 北京仿真中心,北京 100854;
    2. 复杂产品智能制造系统技术国家重点实验室 北京电子工程总体研究所,北京 100854;
    3. 航天系统仿真重点实验室 北京仿真中心,北京 100854
  • 收稿日期:2019-05-21 修回日期:2019-07-25 出版日期:2019-11-10 发布日期:2019-12-13
  • 作者简介:李鹤宇(1993-),男,河北石家庄,硕士,研究方向为深度强化学习,建模仿真技术;赵志龙(1987-),男,河北廊坊,硕士,助工,研究方向为虚拟样机,智能制造等。
  • 基金资助:
    国家重点研发计划(2018YFB1004005)

Robot Arm Control Method Based on Deep Reinforcement Learning

Li Heyu1, Zhao Zhilong1,2,3, Gu Lei1, Guo Liqin1,2,3, Zeng Bi1, Lin Tingyu1,2,3   

  1. 1. Beijing Complex Product Advanced Manufacturing Engineering Research Center, Beijing Simulation Center, Beijing 100854, China;
    2. State Key Laboratory of Intelligent Manufacturing System Technology, Beijing Institute of Electronic System Engineering, Beijing 100854, China;
    3. Science and Technology on Space System Simulation Laboratory, Beijing Simulation Center, Beijing 100854, China
  • Received:2019-05-21 Revised:2019-07-25 Online:2019-11-10 Published:2019-12-13

摘要: 深度强化学习在环境中不断探索尝试,通过奖励函数对神经网络参数进行调节。实际的生产线无法作为算法的试错环境,不能提供足够的数据,构建一个机械臂仿真环境,包括机械臂与物体两部分,根据目标设置状态变量与奖励机制,在模型中对深度确定性策略梯度算法(Deep Deterministic Policy Gradient, DDPG)进行训练,实现通过深度强化学习算法控制机械臂,将抓手移动至物体下方,改进控制算法的适应性,缩短调试时间。实验结果表明,深度学习算法能够在更短的时间内达到收敛,实现对机械臂的控制。

关键词: 系统仿真, Unity, 强化学习, 神经网络

Abstract: Deep reinforcement learning continues to explore in the environment and adjusts the neural network parameters by the reward function. The actual production line can not be used as the trial and error environment for the algorithm, so there is not enough data. For that, this paper constructs a virtual robot arm simulation environment, including the robot arm and the object. The Deep Deterministic Policy Gradient (DDPG),in which the state variables and reward function are set,is trained by deep reinforcement learning algorithm in the simulation environment to realize the target of controlling the robot arm to move the gripper below the object. The new method using neural network can improve the adaptability of the control algorithm and shorten the debugging time. The simulation results show that in the environment constructed in this paper, the deep learning algorithm can converge in a shorter time and control the robot arm to achieve specific goals.

Key words: system simulation, unity, reinforcement learning, neural network

中图分类号: