系统仿真学报 ›› 2019, Vol. 31 ›› Issue (1): 16-26.doi: 10.16182/j.issn1004731x.joss.16PQS-003

• 仿真建模理论与方法 • 上一篇    下一篇

DP-Q(λ):大规模Web3D场景中Multi-agent实时路径规划算法

闫丰亭, 贾金原   

  1. 同济大学,上海 201804
  • 收稿日期:2016-05-31 修回日期:2016-08-04 出版日期:2019-01-08 发布日期:2019-04-16
  • 作者简介:闫丰亭(1980-),男,山东,博士生,研究方向为虚拟现实与机器学习;贾金原(1963-),男,内蒙古,博士,教授,研究方向为分布式虚拟现实、Web3D、游戏引擎。
  • 基金资助:
    国家自然科学基金面上项目(61272270)

DP-Q(λ): Real-time Path Planning for Multi-agent in Large-scale Web3D Scene

Yan Fengting, Jia Jinyuan   

  1. School of Software Engineering, Shanghai 201804, China
  • Received:2016-05-31 Revised:2016-08-04 Online:2019-01-08 Published:2019-04-16

摘要: 大规模场景中Multi-agent可视化路径规划算法,需要在Web3D上实现实时、稳定的碰撞避让。提出了动态概率单链收敛回溯DP-Q(λ)算法,采用方向启发约束,使用高奖赏或重惩罚训练方法,在单智能体上采用概率p(0-1随机数)调节奖罚值,决定下一步的寻路策略,同时感知下一位置是否空闲,完成行走过程的避碰行为,将单智能体的路径规划方案扩展到多智能体路径规划方案中,并进一步在Web3D上实现了这一方案。实验结果表明:该算法实现的多智能体实时路径规划具备了在Web3D上自主学习的高效性和稳定性的要求。

关键词: Web3D, 大规模未知环境, 多智能体, 强化学习, 动态奖赏p, 路径规划

Abstract: The path planning of multi-agent in an unknown large-scale scene needs an efficient and stable algorithm, and needs to solve multi-agent collision avoidance problem, and then completes a real-time path planning in Web3D. To solve above problems, the DP-Q(λ) algorithm is proposed; and the direction constraints, high reward or punishment weight training methods are used to adjust the values of reward or punishment by using a probability p (0-1 random number). The value from reward or punishment determines its next step path planning strategy. If the next position is free, the agent could walk to it. The above strategy is extended to multi-agent path planning, and is used in Web3D. The experiment shows that the DP-Q(λ) algorithm is efficient and stable in the Web3D real-time multi-agent path planning.

Key words: Web3D, large-scale unknown environment, multi-agent, reinforcement learning, dynamic rewards p, path planning

中图分类号: