DP-Q(λ):大规模Web3D场景中Multi-agent实时路径规划算法

doi:10.16182/j.issn1004731x.joss.16PQS-003

系统仿真学报 ›› 2019, Vol. 31 ›› Issue (1): 16-26.doi: 10.16182/j.issn1004731x.joss.16PQS-003

DP-Q(λ):大规模Web3D场景中Multi-agent实时路径规划算法

闫丰亭, 贾金原

同济大学,上海 201804

收稿日期:2016-05-31 修回日期:2016-08-04 出版日期:2019-01-08 发布日期:2019-04-16
作者简介:闫丰亭(1980-),男,山东,博士生,研究方向为虚拟现实与机器学习;贾金原(1963-),男,内蒙古,博士,教授,研究方向为分布式虚拟现实、Web3D、游戏引擎。
基金资助:
国家自然科学基金面上项目(61272270)

DP-Q(λ): Real-time Path Planning for Multi-agent in Large-scale Web3D Scene

Yan Fengting, Jia Jinyuan

School of Software Engineering, Shanghai 201804, China

Received:2016-05-31 Revised:2016-08-04 Online:2019-01-08 Published:2019-04-16

摘要/Abstract

摘要： 大规模场景中Multi-agent可视化路径规划算法,需要在Web3D上实现实时、稳定的碰撞避让。提出了动态概率单链收敛回溯DP-Q(λ)算法,采用方向启发约束,使用高奖赏或重惩罚训练方法,在单智能体上采用概率p(0-1随机数)调节奖罚值,决定下一步的寻路策略,同时感知下一位置是否空闲,完成行走过程的避碰行为,将单智能体的路径规划方案扩展到多智能体路径规划方案中,并进一步在Web3D上实现了这一方案。实验结果表明：该算法实现的多智能体实时路径规划具备了在Web3D上自主学习的高效性和稳定性的要求。

关键词: Web3D, 大规模未知环境, 多智能体, 强化学习, 动态奖赏p, 路径规划

Abstract: The path planning of multi-agent in an unknown large-scale scene needs an efficient and stable algorithm, and needs to solve multi-agent collision avoidance problem, and then completes a real-time path planning in Web3D. To solve above problems, the DP-Q(λ) algorithm is proposed; and the direction constraints, high reward or punishment weight training methods are used to adjust the values of reward or punishment by using a probability p (0-1 random number). The value from reward or punishment determines its next step path planning strategy. If the next position is free, the agent could walk to it. The above strategy is extended to multi-agent path planning, and is used in Web3D. The experiment shows that the DP-Q(λ) algorithm is efficient and stable in the Web3D real-time multi-agent path planning.

Key words: Web3D, large-scale unknown environment, multi-agent, reinforcement learning, dynamic rewards p, path planning

中图分类号:

TP391

闫丰亭, 贾金原. DP-Q(λ):大规模Web3D场景中Multi-agent实时路径规划算法[J]. 系统仿真学报, 2019, 31(1): 16-26.

Yan Fengting, Jia Jinyuan. DP-Q(λ): Real-time Path Planning for Multi-agent in Large-scale Web3D Scene[J]. Journal of System Simulation, 2019, 31(1): 16-26.

参考文献

[1] Viet H H, An S H.Dyna-Q-based Vector Direction for Path Planning Problem of Autonomous Mobile Robots in Unknown Environments[J]. Advanced Robotics (S0169-1864), 2013, 27(3): 159-173.
[2] Bayili S, Polat F.Limited-damage A*: A Path Search Algorithm That Considers Damage As A Fesibility Criterion[J]. Konwledge-Based Systems (S0950-7051), 2011, 24: 501-502.
[3] MA Tsai, C C Huang, H C Chan, et al. Parallel Elite Genetic Algorithm and Its Application to Global Path Planning for Autonomous Robot Navigation[J]. IEEE Transactions on Industrial Electronics (S0278-0046), 2011, 58(10): 4813-4821.
[4] Xin Ma, Ya Xu, Guoqing Sun, et al.State-chain Sequential Feedback Reinforcement Learning for Path Planning of Autonomous Mobile Robots[J]. Journal of Zhejiang University-SCIENCE (Computer & Electronics)(S1869-1951), 2013, 14(3): 167-178.
[5] Yong Song, Yi-bin Li, Cai-hong Li, et al. An Efficient Initialization Approach of Q-learning for Mobile Robots[J]. International Journal of Control Automation and System (S1598-6446), 2012, 10(1): 166-172.
[6] Fatemeh Fatyhinezhad, Vali Derhami, Mehdi Rezaeian.Supervised Fuzzy Reinforcement Learning for Robot Navigation[J]. Applied Soft Computing (S1568-4946), 2016, 40(C): 33-41.
[7] Francisco Martinez-Gil, Miguel Lozano, Fernando Fernandez.Strategies for Simulating Pedestrian Navigation with Multiple Reinforcement Learning Agents[J]. Autonomous Agents and Multi-Agent Systems (S1387-2532), 2015, 29(1): 98-130.
[8] Fard M, Pineau J.Non-deterministic Policies in Markovian Decision Process[J]. Journal of Articificial Intelligence Research (S1076-9757), 2011, 11(2): 1-24.
[9] Desouky S F, Schwartz H M.Q(λ)‐learning Adaptive Fuzzy Logic Controllers for Pursuit-evasion Differential Games[J]. International Journal of Adaptive Control & Signal Processing (S0890-6327), 2011, 25(10): 910-927.
[10] Lglesias A, Martinez P, Aler R, et al.Reinforcement Learning of Pedagogical Policies in Adaptive and Intelligent Educational Systems[J]. Knowledge-Based Systems (S0950-7051), 2009, 22(4): 266-270.
[11] Al-Taharwa I, Sheta A, Al-Weshah M.A Mobile Robot Path Planning Using Genetic Algorithm in Static Environment[J]. Journal of Computational Science (S1877-7503), 2008, 4(4): 341-344.
[12] Remolina E, Kuipers B.Towards A General Theory of Topological Maps[J]. Artificial Intelligence (S0004-3702), 2004, 15(2): 47-104.
[13] Barraquand J, Langlois B, Latombe J C.Numerical Potential Field Techniques for Robot Path Planning[J]. IEEE Transactions on Systems Man and Cybernetics Part A-systems and Humans (S1083-442), 1992, 22(2): 224-241.
[14] Dolgov D, Thrun S, Montemerlo M, et al.Path Planning for Autonomous Vehicles in Unknown Semi-structured Environments[J]. International Journal of Robotics Research (S0278-3649), 2010, 29(5): 485-501.
[15] Jaradat M A K, Al-Rousan M, Quadan L. Reinforcement Based Mobile Robot Navigation in Dynamic Environment[J]. Robotics & Computer Integrated Manufacturing (S0736-5845), 2011, 27(1):135-149.
[16] Kala R, Shukla A, Tiwari R.Fusion of Probabilistic A* Algorithm and Fuzzy Inference System for Robotic Path Planning[J]. Artificial Intelligence Review (S0004-3702), 2010, 33(4): 307-327.
[17] Hwang H J, Viet H H, Chung T.Q(λ) Based Vector Direction for Path Planning Problem of Autonomous Mobile Robots[J]. Lecture Notes in Electrical Engineering (S1876-1100), 2011, 10(4): 433-442.
[18] Still S.An Information-theoretic Approach to Curiosity-driven Reinforcement Learning[J]. Theory in Biosciences (S1431-7613), 2012, 131(3): 139-148.
[19] Guo M, Liu Y, Malec J.A New Q-learning Algorithm Based on The Metro Poliscriterion[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics (S2168-2216), 2004, 34(5): 2140-2143.
[20] Alvarez C, Santons M, Lopez V.Reinforcement Learning vs. A* in A Role Playing Game Benchmark Scenario[J]. Computational Intelligence, Foundations and Applications (S14100-8640), 2010, 13(6): 644-650.
[21] Bkassiny M, Li Y, Jayaweera S K.A Survey on Machine-Learning Techniques in Cognitive Radios[J]. IEEE Communications Surveys & Tutorials (S1553-877X), 2013, 15(3): 1136-1159.
[22] Martin J A, De Lope J, Maravall D.Analysis and Solution of A Predator-protector-prey Multi-robot System by A High-level Reinforcement Learning Architecture and Adaptive Systems Theory[J]. Neuron Computing (S0941-0643), 2010, 58(12): 1266-1272.
[23] Suo Tan, Simon X Yang.A Fuzzy Inference Controller with Accelerate/Brake Module for Mobile Robots[C]. Proceedings of the IEEE International Conference on Automation and Logistics Qingdao, China September, Fuzzy Information and Engineering, 2008: 810-815.
[24] Yang H Y, Zhang F Z .Autonomous Mobile Intelligent Robots on Fuzzy System with Optimal Theories[J]. Fuzzy Information and Engineering (S1616-8658), 2009. 12(4): 24-32.

DP-Q(λ):大规模Web3D场景中Multi-agent实时路径规划算法

DP-Q(λ): Real-time Path Planning for Multi-agent in Large-scale Web3D Scene

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	陆淼嘉, 黄承媛, 滕靖. 基于多智能体的网购生鲜无人车配送调度仿真[J]. 系统仿真学报, 2022, 34(6): 1185-1195.
[2]	赵也践, 王艳红, 张俊, 于洪霞, 田中大. 改进Q学习算法在作业车间调度问题中的应用[J]. 系统仿真学报, 2022, 34(6): 1247-1258.
[3]	张森, 张孟炎, 邵敬平, 普杰信. 基于随机策略搜索的多机三维路径规划方法[J]. 系统仿真学报, 2022, 34(6): 1286-1295.
[4]	倪凌佳, 黄晓霞, 李红旮, 张子博. 基于协作式深度强化学习的火灾应急疏散仿真研究[J]. 系统仿真学报, 2022, 34(6): 1353-1366.
[5]	蒙盾, 胡卓, 张华军. 基于改进A^*算法的多层邮轮疏散系统仿真[J]. 系统仿真学报, 2022, 34(6): 1375-1382.
[6]	梁江涛, 王慧琴. 基于改进蚁群算法的建筑火灾疏散路径规划研究[J]. 系统仿真学报, 2022, 34(5): 1044-1053.
[7]	邓向阳, 张立民, 方伟, 汤淼. 基于双向汇聚引导蚁群算法的机器人路径规划[J]. 系统仿真学报, 2022, 34(5): 1101-1108.
[8]	李兆强, 张时雨. 基于快速RRT算法的三维路径规划算法研究[J]. 系统仿真学报, 2022, 34(3): 503-511.
[9]	王红微, 杨鹏. 基于深度强化学习的机场货运业务优化研究[J]. 系统仿真学报, 2022, 34(3): 651-660.
[10]	王霄汉, 张霖, 赖李媛君, 谢堃钰, 胡听春. 基于DEVS原子模型的智能体离散仿真构建方法[J]. 系统仿真学报, 2022, 34(2): 191-200.
[11]	李锋, 魏莹. 社会学习和参照点效应对企业产品决策的影响[J]. 系统仿真学报, 2022, 34(2): 234-246.
[12]	李启锐, 彭心怡. 基于深度强化学习的云作业调度及仿真研究[J]. 系统仿真学报, 2022, 34(2): 258-268.
[13]	王启明, 宗高强, 胥津铭. 多段式自动泊车最优路径规划与仿真分析[J]. 系统仿真学报, 2022, 34(2): 385-395.
[14]	吴曦, 孟祥林, 杨镜宇. 下一代战略博弈推演系统研究[J]. 系统仿真学报, 2021, 33(9): 2017-2024.
[15]	王泊涵, 吴婷钰, 李文浩, 黄达, 金博, 杨峰, 周爱民, 王祥丰. 基于多智能体强化学习的大规模无人机集群对抗[J]. 系统仿真学报, 2021, 33(8): 1739-1753.