系统仿真学报 ›› 2024, Vol. 36 ›› Issue (11): 2631-2643.doi: 10.16182/j.issn1004731x.joss.23-0939

• 研究论文 • 上一篇    

基于多模态深度强化学习的端到端无人车运动规划

丁开源1,2, 艾斯卡尔·艾木都拉1,2, 朱斌3, 伊克萨尼·普尔凯提1, 马正堂1   

  1. 1.新疆大学 计算机科学与技术学院,新疆 乌鲁木齐 830017
    2.新疆信号检测与处理重点实验室,新疆 乌鲁木齐 830017
    3.清华大学 自动化系,北京 100084
  • 收稿日期:2023-07-25 修回日期:2023-10-11 出版日期:2024-11-13 发布日期:2024-11-19
  • 第一作者简介:丁开源(1996-),男,硕士生,研究方向为智能无人系统。

End-to-end Motion Planning of Unmanned Vehicles Based on Multimodal Deep Reinforcement Learning

Ding Kaiyuan1,2, Hamdulla Askar1,2, Zhu Bin3, Firkat Eksan1, Ma Zhengtang1   

  1. 1.School of Computer Science and Technology, Xinjiang University, Urumqi 830017, China
    2.Xinjiang Key Laboratory of Signal Detection and Processing, Urumqi 830017, China
    3.Department of Automation, Tsinghua University, Beijing 100084, China
  • Received:2023-07-25 Revised:2023-10-11 Online:2024-11-13 Published:2024-11-19

摘要:

将强化学习应用到机器人的运动规划领域时,智能体无法感知周围环境且不能有效避开障碍物,从而无法推广到复杂、具有挑战性的地形。针对这些问题,提出使用基于多模态深度强化学习来解决无人车的运动规划任务,该方法学习如何结合本体感知状态和高维深度传感器输入。具体来说,本体感知状态提供用于即时反应的接触测量,并且无人车可以通过配备的视觉传感器学习并预测环境变化,提前多个时间步骤主动机动地应对障碍和不平坦地形的环境。提出了一种全新的端到端多模态Transformer融合模型,称为TransProAct (transformer-based proactive action),通过该模型的自我注意力机制融合本体感知状态和视觉信息,利用深度强化学习PPO 算法训练无人车自我学习运动规划,引入多模态延迟随机化解决模拟和现实世界之间的差异。分别在不同障碍和不平坦地形的具有挑战性的仿真环境中进行评估,结果表明基于多模态深度强化学习的方法不仅显著改进了基线,在泛化性上也有很大的提高。

关键词: 多模态感知, 强化学习, 无人车, 运动规划, 神经网络

Abstract:

Since the agent cannot sense the surrounding environment and cannot successfully avoid obstacles, reinforcement learning fails to be generalized to robot motion planning in difficult terrain. Therefore, a solution based on multimodal deep reinforcement learning, which learns to blend proprioceptive states with high-dimensional depth sensor inputs, is proposed for the motion planning of unmanned vehicles. To be specific, proprioceptive states offer contact measurement for immediate reaction, and the unmanned vehicle can learn and forecast environmental changes with its attached visual sensors, proactively navigating around obstacles and uneven terrains numerous time steps ahead. TransProAct (transformer-based proactive action), a unique end-to-end multimodal Transformer fusion model, is proposed. Proprioceptive states and visual data are fused through its self-attention mechanism, and then the deep reinforcement algorithm PPO is used to train the self-learning of motion planning by the unmanned vehicle. In addition, multimodal delay randomization is introduced to resolve the differences between simulation and reality. After being tested in difficult simulation environments with a variety of barriers and uneven ground, the proposed approach shows notable gains over the baseline and a remarkable improvement in generalization ability.

Key words: multimodal perception, reinforcement learning, unmanned vehicle, motion planning, neural network

中图分类号: