End-to-end Motion Planning of Unmanned Vehicles Based on Multimodal Deep Reinforcement Learning

doi:10.16182/j.issn1004731x.joss.23-0939

Abstract

Abstract:

Since the agent cannot sense the surrounding environment and cannot successfully avoid obstacles, reinforcement learning fails to be generalized to robot motion planning in difficult terrain. Therefore, a solution based on multimodal deep reinforcement learning, which learns to blend proprioceptive states with high-dimensional depth sensor inputs, is proposed for the motion planning of unmanned vehicles. To be specific, proprioceptive states offer contact measurement for immediate reaction, and the unmanned vehicle can learn and forecast environmental changes with its attached visual sensors, proactively navigating around obstacles and uneven terrains numerous time steps ahead. TransProAct (transformer-based proactive action), a unique end-to-end multimodal Transformer fusion model, is proposed. Proprioceptive states and visual data are fused through its self-attention mechanism, and then the deep reinforcement algorithm PPO is used to train the self-learning of motion planning by the unmanned vehicle. In addition, multimodal delay randomization is introduced to resolve the differences between simulation and reality. After being tested in difficult simulation environments with a variety of barriers and uneven ground, the proposed approach shows notable gains over the baseline and a remarkable improvement in generalization ability.

Key words: multimodal perception, reinforcement learning, unmanned vehicle, motion planning, neural network

CLC Number:

TP242.6

Ding Kaiyuan, Hamdulla Askar, Zhu Bin, Firkat Eksan, Ma Zhengtang. End-to-end Motion Planning of Unmanned Vehicles Based on Multimodal Deep Reinforcement Learning[J]. Journal of System Simulation, 2024, 36(11): 2631-2643.

Figures/Tables 14

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Fig. 6

Table 1

Fig.7

Table 2

Table 3

Table 4

Table 5

Table 6

Fig. 8

References 25

1	Lu Xinghao, Zhao Haiyan, Gao Bingzhao, et al. Decision-making Method of Autonomous Vehicles in Urban Environments Considering Traffic Laws[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(11): 21641-21652.
2	Yoon Youngmin, Yi Kyongsu. Trajectory Prediction Using Graph-based Deep Learning for Longitudinal Control of Autonomous Vehicles: A Proactive Approach for Autonomous Driving in Urban Dynamic Traffic Environments[J]. IEEE Vehicular Technology Magazine, 2022, 17(4): 18-27.
3	Danilo Alves de Lima, Alessandro Corrêa Victorino. A Hybrid Controller for Vision-based Navigation of Autonomous Vehicles in Urban Environments[J]. IEEE Transactions on Intelligent Transportation Systems, 2016, 17(8): 2310-2323.
4	Puente-Castro Alejandro, Rivero Daniel, Pazos Alejandro, et al. UAV Swarm Path Planning with Reinforcement Learning for Field Prospecting[J]. Applied Intelligence, 2022, 52(12): 14101-14118.
5	Wang Gongcheng, Wang Weidong, Ding Pengchao, et al. Development of a Search and Rescue Robot System for the Underground Building Environment[J]. Journal of Field Robotics, 2023, 40(3): 655-683.
6	Miller I D, Cladera F, Cowley A, et al. Mine Tunnel Exploration Using Multiple Quadrupedal Robots[J]. IEEE Robotics and Automation Letters, 2020, 5(2): 2840-2847.
7	Aracri S, Giorgio-Serchi F, Suaria Giuseppe, et al. Soft Robots for Ocean Exploration and Offshore Operations: A Perspective[J]. Soft Robotics, 2021, 8(6): 625-639.
8	Dang T, Tranzatto Marco, Khattak S, et al. Graph-based Subterranean Exploration Path Planning Using Aerial and Legged Robots[J]. Field Robotics, 2020, 37(8): 1363-1388.
9	Matthis J, Hayhoe M. The Functional Coupling of Gaze and Gait When Walking Over Real-world Rough Terrain[J]. Journal of Vision, 2016, 16(12): 766.
10	Liu Zhe, Liu Qiming, Tang Ling, et al. Visuomotor Reinforcement Learning for Multirobot Cooperative Navigation[J]. IEEE Transactions on Automation Science and Engineering, 2022, 19(4): 3234-3245.
11	Levine S, Pastor P, Krizhevsky A, et al. Learning Hand-eye Coordination for Robotic Grasping with Deep Learning and Large-scale Data Collection[J]. The International Journal of Robotics Research, 2018, 37(4/5): 421-436.
12	Sofman B, Lin E, Bagnell J A, et al. Improving Robot Navigation Through Self-supervised Online Learning[J]. Journal of Field Robotics, 2006, 23(11/12): 1059-1075.
13	Jeong R, Aytar Y, Khosid D, et al. Self-supervised Sim-to-real Adaptation for Visual Robotic Manipulation[C]//2020 IEEE International Conference on Robotics and Automation (ICRA). Piscataway: IEEE, 2020: 2718-2724.
14	Wen Shuhuan, Wen Zeteng, Zhang Di, et al. A Multi-robot Path-planning Algorithm for Autonomous Navigation Using Meta-reinforcement Learning Based on Transfer Learning[J]. Applied Soft Computing, 2021, 110: 107605.
15	Urcola Pablo, Lorente María-Teresa, Villarroel José L, et al. Robust Navigation and Seamless Localization for Carlike Robots in Indoor-outdoor Environments[J]. Journal of Field Robotics, 2017, 34(4): 704-735.
16	Tian S, Ebert F, Jayaraman D, et al. Manipulation by Feel: Touch-based Control with Deep Predictive Models[C]//2019 International Conference on Robotics and Automation (ICRA). Piscataway: IEEE, 2019: 818-824.
17	Escontrela A, Yu G, Xu Peng, et al. Zero-shot Terrain Generalization for Visual Locomotion Policies[EB/OL]. (2020-11-11) [2023-05-11]. .
18	Liang Hongzhuo, Cong Lin, Hendrich Norman, et al. Multifingered Grasping Based on Multimodal Reinforcement Learning[J]. IEEE Robotics and Automation Letters, 2022, 7(2): 1174-1181.
19	Vaswani A, Shazeer N, Parmar N, et al. Attention Is All You Need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 6000-6010.
20	Li Zhenyu, Zhou Aiguo, Pu Jiakun, et al. Multi-modal Neural Feature Fusion for Automatic Driving Through Perception-aware Path Planning[J]. IEEE Access, 2021, 9: 142782-142794.
21	Jain D, Iscen A, Caluwaerts K. Hierarchical Reinforcement Learning for Quadruped Locomotion[C]//2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Piscataway: IEEE, 2019: 7551-7557.
22	Li Yaxin, Chen Yan, Yang Zhen, et al. Design of a Multi-modal Sensor Fusion Unmanned Vehicle System Based on Computer Vision[J]. Journal of Physics: Conference Series, 2023, 2504(1): 012033.
23	Zhang Lijuan, Peng Jiabin, Yi Jiabin, et al. A State-decomposition DDPG Algorithm for UAV Autonomous Navigation in 3-D Complex Environments[J]. IEEE Internet of Things Journal, 2024, 11(6): 10778-10790.
24	张福海, 李宁, 袁儒鹏, 等. 基于强化学习的机器人路径规划算法[J]. 华中科技大学学报(自然科学版), 2018, 46(12): 65-70.
	Zhang Fuhai, Li Ning, Yuan Rupeng, et al. Robot Path Planning Algorithm Based on Reinforcement Learning[J]. Journal of Huazhong University of Science and Technology(Natural Science Edition), 2018, 46(12): 65-70.
25	赵烈海, 李大鹏. 高密度场景下基于改进A^*算法的无人机路径规划[J]. 无线电通信技术, 2024, 50(4): 713-719.
	Zhao Liehai, Li Dapeng. Unmanned Aerial Vehicle Path Planning Based on Improved A^* Algorithm in High-density Scenarios[J]. Radio Communications Technology, 2024, 50(4): 713-719.

训练参数	值
每回合最大步数	1 000
折扣因子	0.99
剪切参数	0.2
批量大小优化器策略网络学习率价值网络学习率	256 Adam 1e-4 1e-4

方法	移动距离/m	碰撞次数
State-Only	1.1±1.1	—
Depth-Only	7.6±1.3	456.3±62.2
State-Depth-Concat	10.3±1.4	238.2±143.4
HRL	11.6±2.1	206.8±69.4
Ours	14.4±2.5	85.4±101.3

方法	移动距离/m	小球奖励	碰撞次数
State-Only	1.1±1.1	—	—
Depth-Only	3.6±1.3	80.0±43.0	456.3±262.2
State-Depth-Concat	5.6±2.1	206.4±41.1	226.8±59.5
HRL	10.8±0.8	167.2±52.3	267.2±82.3
Ours	15.2±2.5	233.3±47.1	250.4±71.3

方法	移动距离/m	碰撞次数
State-Only	1.1±1.1	—
Depth-Only	5.6±1.3	156.3±26.2
State-Depth-Concat	12.6±2.1	96.8±39.5
HRL	7.2±2.6	75.2±13.4
Ours	10.2±2.5	67.8±19.3

方法	移动距离(3D)
State-Only	2.1±0.3
Depth-Only	3.8±1.4
State-Depth-Concat	4.6±0.9
HRL	6.2±0.3
Ours	6.9±1.2