系统仿真学报 ›› 2025, Vol. 37 ›› Issue (3): 595-606.doi: 10.16182/j.issn1004731x.joss.24-0088

• 论文 • 上一篇    

基于深度强化学习的AGV行人避让策略研究

王贺1,2, 许佳宁1, 闫广宇1,2   

  1. 1.沈阳建筑大学 机械工程学院,辽宁 沈阳 110000
    2.沈阳建筑大学 现代建筑工程装备与技术国际合作联合实验室,辽宁 沈阳 110000
  • 收稿日期:2024-01-22 修回日期:2024-03-07 出版日期:2025-03-17 发布日期:2025-03-21
  • 通讯作者: 闫广宇
  • 第一作者简介:王贺(1981-),男,副教授,博士,研究方向为陶瓷零件制备及加工技术。
  • 基金资助:
    国家自然科学基金(51942507);学科创新引智项目(D18017);中国科协青年人才托举工程项目(2023QNRC001);辽宁省教育厅项目(Infw202017)

Research on Pedestrian Avoidance Strategy for AGV Based on Deep Reinforcement Learning

Wang He1,2, Xu Jianing1, Yan Guangyu1,2   

  1. 1.School of Mechanical Engineering, Shenyang Jianzhu University, Shenyang 110000, China
    2.Joint International Research Laboratory of Modern Construction Engineering Equipment and Technology, Shenyang Jianzhu University, Shenyang 110000, China
  • Received:2024-01-22 Revised:2024-03-07 Online:2025-03-17 Published:2025-03-21
  • Contact: Yan Guangyu

摘要:

为控制自动导引车(AGV)在智能工厂环境中避障时能够保障行人的安全舒适,提出一种基于深度强化学习的AGV端到端避障方法。引入YOLOv8模块提取行人位姿信息,并设计了基于视觉的状态空间根据个人空间理论设计强化学习的奖惩机制,对AGV进入行人舒适空间和发生碰撞等行为进行惩罚;搭建了虚拟仿真系统,使用PPO并结合LSTM网络层完成了避障策略的训练并进行仿真实验验证。仿真结果表明:该避障策略在不建立环境地图、视觉输入的条件下,能够控制AGV在避障过程中与行人保持舒适的社交距离。

关键词: 深度强化学习, 自动导引车, YOLOv8, 近端策略优化, 避障, 个人空间理论, 端到端

Abstract:

To ensure the safety and comfort of pedestrians during Automated Guided Vehicle (AGV) obstacle avoidance in smart factory environments, a deep reinforcement learning-based end-to-end obstacle avoidance method is proposed.The YOLOv8 module is introduced to extract pedestrian pose information, and a visual-based state space is designed. A reinforcement learning mechanism is formulated based on personal space theory, penalizing AGV behaviors such as entering pedestrian comfort space and collisions. A virtual simulation system is constructed, utilizing PPO algorithm along with LSTM network layer for obstacle avoidance strategy training and simulation experiments. Simulation results indicate that this obstacle avoidance strategy, under conditions of no environmental map establishment and visual input, can control the AGV to maintain a comfortable social distance from pedestrians during obstacle avoidance..

Key words: DRL, AGV, YOLOv8, PPO, obstacle avoidance, personal space theory, end to end

中图分类号: