基于多模态深度强化学习的端到端无人车运动规划

doi:10.16182/j.issn1004731x.joss.23-0939

摘要/Abstract

摘要：

将强化学习应用到机器人的运动规划领域时，智能体无法感知周围环境且不能有效避开障碍物，从而无法推广到复杂、具有挑战性的地形。针对这些问题，提出使用基于多模态深度强化学习来解决无人车的运动规划任务，该方法学习如何结合本体感知状态和高维深度传感器输入。具体来说，本体感知状态提供用于即时反应的接触测量，并且无人车可以通过配备的视觉传感器学习并预测环境变化，提前多个时间步骤主动机动地应对障碍和不平坦地形的环境。提出了一种全新的端到端多模态Transformer融合模型，称为TransProAct （transformer-based proactive action），通过该模型的自我注意力机制融合本体感知状态和视觉信息，利用深度强化学习PPO 算法训练无人车自我学习运动规划，引入多模态延迟随机化解决模拟和现实世界之间的差异。分别在不同障碍和不平坦地形的具有挑战性的仿真环境中进行评估，结果表明基于多模态深度强化学习的方法不仅显著改进了基线，在泛化性上也有很大的提高。

关键词: 多模态感知, 强化学习, 无人车, 运动规划, 神经网络

Abstract:

Since the agent cannot sense the surrounding environment and cannot successfully avoid obstacles, reinforcement learning fails to be generalized to robot motion planning in difficult terrain. Therefore, a solution based on multimodal deep reinforcement learning, which learns to blend proprioceptive states with high-dimensional depth sensor inputs, is proposed for the motion planning of unmanned vehicles. To be specific, proprioceptive states offer contact measurement for immediate reaction, and the unmanned vehicle can learn and forecast environmental changes with its attached visual sensors, proactively navigating around obstacles and uneven terrains numerous time steps ahead. TransProAct (transformer-based proactive action), a unique end-to-end multimodal Transformer fusion model, is proposed. Proprioceptive states and visual data are fused through its self-attention mechanism, and then the deep reinforcement algorithm PPO is used to train the self-learning of motion planning by the unmanned vehicle. In addition, multimodal delay randomization is introduced to resolve the differences between simulation and reality. After being tested in difficult simulation environments with a variety of barriers and uneven ground, the proposed approach shows notable gains over the baseline and a remarkable improvement in generalization ability.

Key words: multimodal perception, reinforcement learning, unmanned vehicle, motion planning, neural network

中图分类号:

TP242.6

丁开源,艾斯卡尔·艾木都拉,朱斌等 . 基于多模态深度强化学习的端到端无人车运动规划[J]. 系统仿真学报, 2024, 36(11): 2631-2643.

Ding Kaiyuan,Hamdulla Askar,Zhu Bin,et al . End-to-end Motion Planning of Unmanned Vehicles Based on Multimodal Deep Reinforcement Learning[J]. Journal of System Simulation, 2024, 36(11): 2631-2643.

图/表 14

图1

图2

图3

图4

图5

图6

表1

图7

表2

表3

表4

表5

表6

图8

参考文献 25

1	Lu Xinghao, Zhao Haiyan, Gao Bingzhao, et al. Decision-making Method of Autonomous Vehicles in Urban Environments Considering Traffic Laws[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(11): 21641-21652.
2	Yoon Youngmin, Yi Kyongsu. Trajectory Prediction Using Graph-based Deep Learning for Longitudinal Control of Autonomous Vehicles: A Proactive Approach for Autonomous Driving in Urban Dynamic Traffic Environments[J]. IEEE Vehicular Technology Magazine, 2022, 17(4): 18-27.
3	Danilo Alves de Lima, Alessandro Corrêa Victorino. A Hybrid Controller for Vision-based Navigation of Autonomous Vehicles in Urban Environments[J]. IEEE Transactions on Intelligent Transportation Systems, 2016, 17(8): 2310-2323.
4	Puente-Castro Alejandro, Rivero Daniel, Pazos Alejandro, et al. UAV Swarm Path Planning with Reinforcement Learning for Field Prospecting[J]. Applied Intelligence, 2022, 52(12): 14101-14118.
5	Wang Gongcheng, Wang Weidong, Ding Pengchao, et al. Development of a Search and Rescue Robot System for the Underground Building Environment[J]. Journal of Field Robotics, 2023, 40(3): 655-683.
6	Miller I D, Cladera F, Cowley A, et al. Mine Tunnel Exploration Using Multiple Quadrupedal Robots[J]. IEEE Robotics and Automation Letters, 2020, 5(2): 2840-2847.
7	Aracri S, Giorgio-Serchi F, Suaria Giuseppe, et al. Soft Robots for Ocean Exploration and Offshore Operations: A Perspective[J]. Soft Robotics, 2021, 8(6): 625-639.
8	Dang T, Tranzatto Marco, Khattak S, et al. Graph-based Subterranean Exploration Path Planning Using Aerial and Legged Robots[J]. Field Robotics, 2020, 37(8): 1363-1388.
9	Matthis J, Hayhoe M. The Functional Coupling of Gaze and Gait When Walking Over Real-world Rough Terrain[J]. Journal of Vision, 2016, 16(12): 766.
10	Liu Zhe, Liu Qiming, Tang Ling, et al. Visuomotor Reinforcement Learning for Multirobot Cooperative Navigation[J]. IEEE Transactions on Automation Science and Engineering, 2022, 19(4): 3234-3245.
11	Levine S, Pastor P, Krizhevsky A, et al. Learning Hand-eye Coordination for Robotic Grasping with Deep Learning and Large-scale Data Collection[J]. The International Journal of Robotics Research, 2018, 37(4/5): 421-436.
12	Sofman B, Lin E, Bagnell J A, et al. Improving Robot Navigation Through Self-supervised Online Learning[J]. Journal of Field Robotics, 2006, 23(11/12): 1059-1075.
13	Jeong R, Aytar Y, Khosid D, et al. Self-supervised Sim-to-real Adaptation for Visual Robotic Manipulation[C]//2020 IEEE International Conference on Robotics and Automation (ICRA). Piscataway: IEEE, 2020: 2718-2724.
14	Wen Shuhuan, Wen Zeteng, Zhang Di, et al. A Multi-robot Path-planning Algorithm for Autonomous Navigation Using Meta-reinforcement Learning Based on Transfer Learning[J]. Applied Soft Computing, 2021, 110: 107605.
15	Urcola Pablo, Lorente María-Teresa, Villarroel José L, et al. Robust Navigation and Seamless Localization for Carlike Robots in Indoor-outdoor Environments[J]. Journal of Field Robotics, 2017, 34(4): 704-735.
16	Tian S, Ebert F, Jayaraman D, et al. Manipulation by Feel: Touch-based Control with Deep Predictive Models[C]//2019 International Conference on Robotics and Automation (ICRA). Piscataway: IEEE, 2019: 818-824.
17	Escontrela A, Yu G, Xu Peng, et al. Zero-shot Terrain Generalization for Visual Locomotion Policies[EB/OL]. (2020-11-11) [2023-05-11]. .
18	Liang Hongzhuo, Cong Lin, Hendrich Norman, et al. Multifingered Grasping Based on Multimodal Reinforcement Learning[J]. IEEE Robotics and Automation Letters, 2022, 7(2): 1174-1181.
19	Vaswani A, Shazeer N, Parmar N, et al. Attention Is All You Need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 6000-6010.
20	Li Zhenyu, Zhou Aiguo, Pu Jiakun, et al. Multi-modal Neural Feature Fusion for Automatic Driving Through Perception-aware Path Planning[J]. IEEE Access, 2021, 9: 142782-142794.
21	Jain D, Iscen A, Caluwaerts K. Hierarchical Reinforcement Learning for Quadruped Locomotion[C]//2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Piscataway: IEEE, 2019: 7551-7557.
22	Li Yaxin, Chen Yan, Yang Zhen, et al. Design of a Multi-modal Sensor Fusion Unmanned Vehicle System Based on Computer Vision[J]. Journal of Physics: Conference Series, 2023, 2504(1): 012033.
23	Zhang Lijuan, Peng Jiabin, Yi Jiabin, et al. A State-decomposition DDPG Algorithm for UAV Autonomous Navigation in 3-D Complex Environments[J]. IEEE Internet of Things Journal, 2024, 11(6): 10778-10790.
24	张福海, 李宁, 袁儒鹏, 等. 基于强化学习的机器人路径规划算法[J]. 华中科技大学学报(自然科学版), 2018, 46(12): 65-70.
	Zhang Fuhai, Li Ning, Yuan Rupeng, et al. Robot Path Planning Algorithm Based on Reinforcement Learning[J]. Journal of Huazhong University of Science and Technology(Natural Science Edition), 2018, 46(12): 65-70.
25	赵烈海, 李大鹏. 高密度场景下基于改进A^*算法的无人机路径规划[J]. 无线电通信技术, 2024, 50(4): 713-719.
	Zhao Liehai, Li Dapeng. Unmanned Aerial Vehicle Path Planning Based on Improved A^* Algorithm in High-density Scenarios[J]. Radio Communications Technology, 2024, 50(4): 713-719.

训练参数	值
每回合最大步数	1 000
折扣因子	0.99
剪切参数	0.2
批量大小优化器策略网络学习率价值网络学习率	256 Adam 1e-4 1e-4

方法	移动距离/m	碰撞次数
State-Only	1.1±1.1	—
Depth-Only	7.6±1.3	456.3±62.2
State-Depth-Concat	10.3±1.4	238.2±143.4
HRL	11.6±2.1	206.8±69.4
Ours	14.4±2.5	85.4±101.3

方法	移动距离/m	小球奖励	碰撞次数
State-Only	1.1±1.1	—	—
Depth-Only	3.6±1.3	80.0±43.0	456.3±262.2
State-Depth-Concat	5.6±2.1	206.4±41.1	226.8±59.5
HRL	10.8±0.8	167.2±52.3	267.2±82.3
Ours	15.2±2.5	233.3±47.1	250.4±71.3

方法	移动距离/m	碰撞次数
State-Only	1.1±1.1	—
Depth-Only	5.6±1.3	156.3±26.2
State-Depth-Concat	12.6±2.1	96.8±39.5
HRL	7.2±2.6	75.2±13.4
Ours	10.2±2.5	67.8±19.3

方法	移动距离(3D)
State-Only	2.1±0.3
Depth-Only	3.8±1.4
State-Depth-Concat	4.6±0.9
HRL	6.2±0.3
Ours	6.9±1.2