DP-Q(λ): Real-time Path Planning for Multi-agent in Large-scale Web3D Scene

doi:10.16182/j.issn1004731x.joss.16PQS-003

Abstract

Abstract: The path planning of multi-agent in an unknown large-scale scene needs an efficient and stable algorithm, and needs to solve multi-agent collision avoidance problem, and then completes a real-time path planning in Web3D. To solve above problems, the DP-Q(λ) algorithm is proposed; and the direction constraints, high reward or punishment weight training methods are used to adjust the values of reward or punishment by using a probability p (0-1 random number). The value from reward or punishment determines its next step path planning strategy. If the next position is free, the agent could walk to it. The above strategy is extended to multi-agent path planning, and is used in Web3D. The experiment shows that the DP-Q(λ) algorithm is efficient and stable in the Web3D real-time multi-agent path planning.

Key words: Web3D, large-scale unknown environment, multi-agent, reinforcement learning, dynamic rewards p, path planning

CLC Number:

TP391

Yan Fengting, Jia Jinyuan. DP-Q(λ): Real-time Path Planning for Multi-agent in Large-scale Web3D Scene[J]. Journal of System Simulation, 2019, 31(1): 16-26.

References

[1] Viet H H, An S H.Dyna-Q-based Vector Direction for Path Planning Problem of Autonomous Mobile Robots in Unknown Environments[J]. Advanced Robotics (S0169-1864), 2013, 27(3): 159-173.
[2] Bayili S, Polat F.Limited-damage A*: A Path Search Algorithm That Considers Damage As A Fesibility Criterion[J]. Konwledge-Based Systems (S0950-7051), 2011, 24: 501-502.
[3] MA Tsai, C C Huang, H C Chan, et al. Parallel Elite Genetic Algorithm and Its Application to Global Path Planning for Autonomous Robot Navigation[J]. IEEE Transactions on Industrial Electronics (S0278-0046), 2011, 58(10): 4813-4821.
[4] Xin Ma, Ya Xu, Guoqing Sun, et al.State-chain Sequential Feedback Reinforcement Learning for Path Planning of Autonomous Mobile Robots[J]. Journal of Zhejiang University-SCIENCE (Computer & Electronics)(S1869-1951), 2013, 14(3): 167-178.
[5] Yong Song, Yi-bin Li, Cai-hong Li, et al. An Efficient Initialization Approach of Q-learning for Mobile Robots[J]. International Journal of Control Automation and System (S1598-6446), 2012, 10(1): 166-172.
[6] Fatemeh Fatyhinezhad, Vali Derhami, Mehdi Rezaeian.Supervised Fuzzy Reinforcement Learning for Robot Navigation[J]. Applied Soft Computing (S1568-4946), 2016, 40(C): 33-41.
[7] Francisco Martinez-Gil, Miguel Lozano, Fernando Fernandez.Strategies for Simulating Pedestrian Navigation with Multiple Reinforcement Learning Agents[J]. Autonomous Agents and Multi-Agent Systems (S1387-2532), 2015, 29(1): 98-130.
[8] Fard M, Pineau J.Non-deterministic Policies in Markovian Decision Process[J]. Journal of Articificial Intelligence Research (S1076-9757), 2011, 11(2): 1-24.
[9] Desouky S F, Schwartz H M.Q(λ)‐learning Adaptive Fuzzy Logic Controllers for Pursuit-evasion Differential Games[J]. International Journal of Adaptive Control & Signal Processing (S0890-6327), 2011, 25(10): 910-927.
[10] Lglesias A, Martinez P, Aler R, et al.Reinforcement Learning of Pedagogical Policies in Adaptive and Intelligent Educational Systems[J]. Knowledge-Based Systems (S0950-7051), 2009, 22(4): 266-270.
[11] Al-Taharwa I, Sheta A, Al-Weshah M.A Mobile Robot Path Planning Using Genetic Algorithm in Static Environment[J]. Journal of Computational Science (S1877-7503), 2008, 4(4): 341-344.
[12] Remolina E, Kuipers B.Towards A General Theory of Topological Maps[J]. Artificial Intelligence (S0004-3702), 2004, 15(2): 47-104.
[13] Barraquand J, Langlois B, Latombe J C.Numerical Potential Field Techniques for Robot Path Planning[J]. IEEE Transactions on Systems Man and Cybernetics Part A-systems and Humans (S1083-442), 1992, 22(2): 224-241.
[14] Dolgov D, Thrun S, Montemerlo M, et al.Path Planning for Autonomous Vehicles in Unknown Semi-structured Environments[J]. International Journal of Robotics Research (S0278-3649), 2010, 29(5): 485-501.
[15] Jaradat M A K, Al-Rousan M, Quadan L. Reinforcement Based Mobile Robot Navigation in Dynamic Environment[J]. Robotics & Computer Integrated Manufacturing (S0736-5845), 2011, 27(1):135-149.
[16] Kala R, Shukla A, Tiwari R.Fusion of Probabilistic A* Algorithm and Fuzzy Inference System for Robotic Path Planning[J]. Artificial Intelligence Review (S0004-3702), 2010, 33(4): 307-327.
[17] Hwang H J, Viet H H, Chung T.Q(λ) Based Vector Direction for Path Planning Problem of Autonomous Mobile Robots[J]. Lecture Notes in Electrical Engineering (S1876-1100), 2011, 10(4): 433-442.
[18] Still S.An Information-theoretic Approach to Curiosity-driven Reinforcement Learning[J]. Theory in Biosciences (S1431-7613), 2012, 131(3): 139-148.
[19] Guo M, Liu Y, Malec J.A New Q-learning Algorithm Based on The Metro Poliscriterion[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics (S2168-2216), 2004, 34(5): 2140-2143.
[20] Alvarez C, Santons M, Lopez V.Reinforcement Learning vs. A* in A Role Playing Game Benchmark Scenario[J]. Computational Intelligence, Foundations and Applications (S14100-8640), 2010, 13(6): 644-650.
[21] Bkassiny M, Li Y, Jayaweera S K.A Survey on Machine-Learning Techniques in Cognitive Radios[J]. IEEE Communications Surveys & Tutorials (S1553-877X), 2013, 15(3): 1136-1159.
[22] Martin J A, De Lope J, Maravall D.Analysis and Solution of A Predator-protector-prey Multi-robot System by A High-level Reinforcement Learning Architecture and Adaptive Systems Theory[J]. Neuron Computing (S0941-0643), 2010, 58(12): 1266-1272.
[23] Suo Tan, Simon X Yang.A Fuzzy Inference Controller with Accelerate/Brake Module for Mobile Robots[C]. Proceedings of the IEEE International Conference on Automation and Logistics Qingdao, China September, Fuzzy Information and Engineering, 2008: 810-815.
[24] Yang H Y, Zhang F Z .Autonomous Mobile Intelligent Robots on Fuzzy System with Optimal Theories[J]. Fuzzy Information and Engineering (S1616-8658), 2009. 12(4): 24-32.