基于深度强化学习的履带机器人摆臂控制方法

doi:10.16182/j.issn1004731x.joss.22-1105

摘要/Abstract

摘要：

摆臂式履带机器人具有一定的地形适应能力，实现摆臂的自主控制对提升机器人在复杂环境中的智能化作业水平具有重要意义。结合专家越障知识和技术指标对机器人的摆臂控制问题进行马尔可夫决策过程(Markov decision process，MDP)建模，基于物理仿真引擎Pymunk搭建了越障训练的仿真环境；提出一种基于D3QN(dueling double DQN)网络模型的深度强化学习摆臂控制算法，以地形信息与机器人状态为输入，以机器人前后四摆臂转角为输出，能够实现挑战性地形下履带机器人摆臂的自学习控制。在Gazebo三维仿真环境中将算法学得的控制策略与人工操纵进行了对比实验，结果表明：所提算法相对人工操纵具有更加高效的复杂地形通行能力。

关键词: 履带机器人, 摆臂自主控制, 自主越障, 深度强化学习, 机器人操作

Abstract:

Tracked robots with flippers have certain terrain adaptation capabilities. To improve the intelligent operation level of robots in complex environments, it is significant to realize the flipper autonomously control. Combining the expert experience in obstacle crossing and optimization indicators, Markov decision process(MDP) modeling of the robot's flipper control problem is carried out and a simulation training environment based on physics simulation engine Pymunk is built. A deep reinforcement learning control algorithm based on dueling double DQN(D3QN) network is proposed for controlling the flippers. With terrain information and robot state as the input and the four flippers' angle as the output, the algorithm can achieve the self-learning control of the flippers in challenging terrain. The learned flipper control policy is compared with the manual operation in Gazebo 3D simulation environment. The results show that the proposed algorithm can enable the flippers of robot to obtain adaptive adjustment ability, which helps the robot pass complex terrain more efficiently.

Key words: tracked robot, flipper autonomous control, autonomous traversal, DRL, robot operation

中图分类号:

TP242.6

潘海南,陈柏良,黄开宏等 . 基于深度强化学习的履带机器人摆臂控制方法[J]. 系统仿真学报, 2024, 36(2): 405-414.

Pan Hainan,Chen Bailiang,Huang Kaihong,et al . Flipper Control Method for Tracked Robot Based on Deep Reinforcement Learning[J]. Journal of System Simulation, 2024, 36(2): 405-414.

图/表 15

图1

图2

图3

图4

图5

图6

表1

表2

摆臂控制算法的参数

$ξ 1$	$ξ 2$	$ξ 3$	$ξ 4$	$t m a x$	γ	l_r
0.04	0.1	10	3	400	0.96	0.000 5

表2

图7

图8

图9

图10

表3

图11

表4

本文算法与人工操作的指标对比

指标	人工操作		本文算法
指标	0.4 m 单台阶	陡峭楼梯	0.4 m 单台阶	陡峭楼梯
t_cost/s	46.17	76.53	28.07	41.21
$θ ̂ R$ /rad	4.02	6.30	3.43	4.53
$θ ̂ R, m a x - θ ̂ R, m i n$ /rad	2.92	3.88	1.11	2.60

表4

参考文献 20

1	Liu Jinguo, Wang Yuechao, Li Bin, et al. Current Research, Key Performances and Future Development of Search and Rescue Robots[J]. Frontiers of Mechanical Engineering in China, 2007, 2(4): 404-416.
2	Kruijff G J M, Pirri F, Gianni M, et al. Rescue Robots at Earthquake-hit Mirandola, Italy: A Field Report[C]//2012 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR). Piscataway, NJ, USA: IEEE, 2012: 1-8.
3	Kruijff G J M, Janíček M, Keshavdas S, et al. Experience in System Design for Human-robot Teaming in Urban Search and Rescue[M]//Yoshida K, Tadokoro S. Field and Service Robotics: Results of the 8th International Conference. Berlin, Heidelberg: Springer Berlin Heidelberg, 2014: 111-125.
4	商德勇. 薄煤层综采工作面巡检机器人运动分析及试验研究[D]. 北京: 中国矿业大学(北京), 2016.
	Shang Deyong. Study on Motion Analysis and Experiment of the Inspection Robot for Fully-mechanized Workface in Thin Coal Seam[D]. Beijing: China University of Mining & Technology(Beijing), 2016.
5	Ohno K, Morimura S, Tadokoro S, et al. Semi-autonomous Control System of Rescue Crawler Robot Having Flippers for Getting Over Unknown-steps[C]//2007 IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway, NJ, USA: IEEE, 2007: 3012-3018.
6	Nagatani K, Yamasaki A, Yoshida K, et al. Semi-autonomous Traversal on Uneven Terrain for a Tracked Vehicle Using Autonomous Control of Active Flippers[C]//2008 IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway, NJ, USA: IEEE, 2008: 2667-2672.
7	Okada Y, Nagatani K, Yoshida K, et al. Shared Autonomy System for Tracked Vehicles on Rough Terrain Based on Continuous Three-dimensional Terrain Scanning[J]. Journal of Field Robotics, 2011, 28(6): 875-893.
8	Kober J, Bagnell J A, Peters J, et al. Reinforcement Learning in Robotics: A Survey[J]. The International Journal of Robotics Research, 2013, 32(11): 1238-1274.
9	Zimmermann K, Zuzanek P, Reinstein M, et al. Adaptive Traversability of Unknown Complex Terrain with Obstacles for Mobile Robots[C]//2014 IEEE International Conference on Robotics and Automation (ICRA). Piscataway, NJ, USA: IEEE, 2014: 5177-5182.
10	Paolo G, Tai Lei, Liu Ming. Towards Continuous Control of Flippers for a Multi-terrain Robot Using Deep Reinforcement Learning[J]. (2017-09-25) [2022-09-19]. .
11	Mitriakov A, Papadakis P, Mai Nguyen S, et al. Staircase Traversal via Reinforcement Learning for Active Reconfiguration of Assistive Robots[C]//2020 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). Piscataway, NJ, USA: IEEE, 2020: 1-8.
12	Mitriakov A, Papadakis P, Mai Nguyen S, et al. Staircase Negotiation Learning for Articulated Tracked Robots with Varying Degrees of Freedom[C]//2020 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR). Piscataway, NJ, USA: IEEE, 2020: 394-400.
13	李允旺, 葛世荣, 朱华, 等. 四履带双摆臂机器人越障机理及越障能力[J]. 机器人, 2010, 32(2): 157-165.
	Li Yunwang, Ge Shirong, Zhu Hua, et al. Obstacle-surmounting Mechanism and Capability of Four-track Robot with Two Swing Arms[J]. Robot, 2010, 32(2): 157-165.
14	Suzuki S, Hasegawa S, Okugawa M. Remote Control System of Disaster Response Robot with Passive Sub-crawlers Considering Falling Down Avoidance[J]. ROBOMECH Journal, 2014, 1(1): 20.
15	Silver D, Singh S, Precup D, et al. Reward Is Enough[J]. Artificial Intelligence, 2021, 299: 103535.
16	Hasselt H. Double Q-learning[C]//Advances in Neural Information Processing Systems. San Francisco, CA, USA: Curran Associates Inc., 2010: 2613-2621.
17	Wang Ziyu, Schaul T, Hessel M, et al. Dueling Network Architectures for Deep Reinforcement Learning[C]//Proceedings of the 33rd International Conference on International Conference on Machine Learning. Cambridge: JMLR, 2016: 1995-2003.
18	Pecka M, Zimmermann K, Svoboda Tomáš. Fast Simulation of Vehicles with Non-deformable Tracks[C]//2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Piscataway, NJ, USA: IEEE, 2017: 6414-6419.
19	Zhang Ji, Singh S. LOAM: Lidar Odometry and Mapping in Real-time[C]//Robotics: Science and Systems Conference(RSS). [S.l.]: [s.n.], 2014, 10(7): 1-9.
20	Pecka M, Šalanský Vojtěch, Zimmermann K, et al. Autonomous Flipper Control with Safety Constraints[C]//2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Piscataway, NJ, USA: IEEE, 2016: 2889-2894.

地形场景	越障类型	台阶高度/m	台阶长度/m	台阶数量	楼梯坡度/(°)
0.4 m单台阶	向上/向下	0.4	3.0	1
陡峭楼梯	向上	0.2	0.3	6	33.7
陡峭楼梯	向下	0.2	0.2	6	45.0

[1]	江明, 何韬. 基于深度强化学习的带容量约束车辆路径问题求解[J]. 系统仿真学报, 2025, 37(9): 2177-2187.
[2]	倪培龙, 毛鹏军, 王宁, 杨孟杰. 基于改进A-DDQN算法的机器人路径规划[J]. 系统仿真学报, 2025, 37(9): 2420-2430.
[3]	陈真, 吴卓屹, 张霖. 深度强化学习中策略表征研究简述[J]. 系统仿真学报, 2025, 37(7): 1753-1769.
[4]	伍国华, 曾家恒, 王得志, 郑龙, 邹伟. 基于深度强化学习的四旋翼航迹跟踪控制方法[J]. 系统仿真学报, 2025, 37(5): 1169-1187.
[5]	张森, 代强强. 改进型深度确定性策略梯度的无人机路径规划[J]. 系统仿真学报, 2025, 37(4): 875-881.
[6]	李敏, 张森, 曾祥光, 王刚, 张童伟, 谢地杰, 任文哲, 张滔. 基于深度强化学习的四足机器人单腿越障轨迹规划[J]. 系统仿真学报, 2025, 37(4): 895-909.
[7]	王贺, 许佳宁, 闫广宇. 基于深度强化学习的AGV行人避让策略研究[J]. 系统仿真学报, 2025, 37(3): 595-606.
[8]	张斌, 雷永林, 李群, 高远, 陈永, 朱佳俊, 鲍琛龙. 基于强化学习的导弹突防决策建模研究[J]. 系统仿真学报, 2025, 37(3): 763-774.
[9]	黄思进, 文佳, 陈哲毅. 面向边缘车联网系统的智能服务迁移方法[J]. 系统仿真学报, 2025, 37(2): 379-391.
[10]	费帅迪, 蔡长龙, 刘飞, 陈明晖, 刘晓明. 舰船防空反导的目标分配方法研究[J]. 系统仿真学报, 2025, 37(2): 508-516.
[11]	白臻祖, 侯一帜, 何章鸣, 魏居辉, 周海银, 王炯琦. 考虑随机扰动的动态武器目标分配优化[J]. 系统仿真学报, 2025, 37(12): 2967-2980.
[12]	郑家瑜, 麦著学, 陈哲毅. 数字孪生云边网络下服务缓存与计算卸载优化[J]. 系统仿真学报, 2025, 37(11): 2741-2753.
[13]	邸剑, 万雪, 姜丽梅. 基于精英指导和随机搜索的进化强化学习[J]. 系统仿真学报, 2025, 37(11): 2877-2887.
[14]	徐忠锴, 储晨阳, 解凯, 赵睿卓, 柯文俊. 基于SC-PPO的高比例新能源电力系统优化调度方法[J]. 系统仿真学报, 2025, 37(10): 2511-2521.
[15]	梁秀满, 刘子良, 刘振东. 基于深度强化学习的改进RRT算法路径规划[J]. 系统仿真学报, 2025, 37(10): 2578-2593.