系统仿真学报 ›› 2024, Vol. 36 ›› Issue (2): 405-414.doi: 10.16182/j.issn1004731x.joss.22-1105

• 论文 • 上一篇    下一篇

基于深度强化学习的履带机器人摆臂控制方法

潘海南(), 陈柏良, 黄开宏(), 任君凯, 程创, 卢惠民, 张辉   

  1. 国防科技大学 智能科学学院,湖南 长沙 410073
  • 收稿日期:2022-09-20 修回日期:2022-12-11 出版日期:2024-02-15 发布日期:2024-02-04
  • 通讯作者: 黄开宏 E-mail:phn@nudt.edu.cn;kaihong.huang@nudt.edu.cn
  • 第一作者简介:潘海南(1998-),男,硕士生,研究方向为机器人运动规划与控制。E-mail:phn@nudt.edu.cn
  • 基金资助:
    国家自然科学基金联合基金重点项目(U1813205)

Flipper Control Method for Tracked Robot Based on Deep Reinforcement Learning

Pan Hainan(), Chen Bailiang, Huang Kaihong(), Ren Junkai, Cheng Chuang, Lu Huimin, Zhang Hui   

  1. College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China
  • Received:2022-09-20 Revised:2022-12-11 Online:2024-02-15 Published:2024-02-04
  • Contact: Huang Kaihong E-mail:phn@nudt.edu.cn;kaihong.huang@nudt.edu.cn

摘要:

摆臂式履带机器人具有一定的地形适应能力,实现摆臂的自主控制对提升机器人在复杂环境中的智能化作业水平具有重要意义。结合专家越障知识和技术指标对机器人的摆臂控制问题进行马尔可夫决策过程(Markov decision process,MDP)建模,基于物理仿真引擎Pymunk搭建了越障训练的仿真环境;提出一种基于D3QN(dueling double DQN)网络模型的深度强化学习摆臂控制算法,以地形信息与机器人状态为输入,以机器人前后四摆臂转角为输出,能够实现挑战性地形下履带机器人摆臂的自学习控制。在Gazebo三维仿真环境中将算法学得的控制策略与人工操纵进行了对比实验,结果表明:所提算法相对人工操纵具有更加高效的复杂地形通行能力。

关键词: 履带机器人, 摆臂自主控制, 自主越障, 深度强化学习, 机器人操作

Abstract:

Tracked robots with flippers have certain terrain adaptation capabilities. To improve the intelligent operation level of robots in complex environments, it is significant to realize the flipper autonomously control. Combining the expert experience in obstacle crossing and optimization indicators, Markov decision process(MDP) modeling of the robot's flipper control problem is carried out and a simulation training environment based on physics simulation engine Pymunk is built. A deep reinforcement learning control algorithm based on dueling double DQN(D3QN) network is proposed for controlling the flippers. With terrain information and robot state as the input and the four flippers' angle as the output, the algorithm can achieve the self-learning control of the flippers in challenging terrain. The learned flipper control policy is compared with the manual operation in Gazebo 3D simulation environment. The results show that the proposed algorithm can enable the flippers of robot to obtain adaptive adjustment ability, which helps the robot pass complex terrain more efficiently.

Key words: tracked robot, flipper autonomous control, autonomous traversal, DRL, robot operation

中图分类号: