%X The path planning of multi-agent in an unknown large-scale scene needs an efficient and stable algorithm, and needs to solve multi-agent collision avoidance problem, and then completes a real-time path planning in Web3D. To solve above problems,* the DP-Q(λ) algorithm is proposed; and the direction constraints, high reward or punishment weight training methods are used to adjust the values of reward or punishment by using a probability p (0-1 random number). The value from reward or punishment determines its next step path planning strategy. *If the next position is free, the agent could walk to it. *The above strategy is extended to multi-agent path planning, and is used in Web3D.* The experiment shows that the DP-Q(*λ*) algorithm is efficient and stable in the Web3D real-time multi-agent path planning.
