Journal of System Simulation ›› 2026, Vol. 38 ›› Issue (3): 714-724.doi: 10.16182/j.issn1004731x.joss.25-0399

• Papers • Previous Articles    

Robot Path Planning by Reinforcement Learning Based on SAC3Q-HDM

Li Dequan1,2, Xiong Wan1   

  1. 1.School of Artificial Intelligence, Anhui University of Science and Technology, Hefei 231131, China
    2.State Key Laboratory of Digital Intelligent Technology for Unmanned Coal Mining, Anhui University of Science and Technology, Huainan 232001, China
  • Received:2025-05-09 Revised:2025-07-25 Online:2026-03-18 Published:2026-03-27

Abstract:

To address the issues of overestimated and underestimated biases, low sample utilization rate, and the inability to balance exploration and exploitation in reinforcement learning for path planning, an improved SAC method was proposed. The size balance of entropy was explored and utilized through adaptive temperature coefficient adjustment; on the basis of the SAC framework, a triple Critic architecture was introduced to dynamically weight and fuse the minimum and average values through Q-value uncertainty, balancing overestimated and underestimated biases. A mixed dynamic sampling experience replay buffer was designed; experience data was partitioned based on reward thresholds; sampling ratios were dynamically adjusted to achieve progressive learning from core strategies to comprehensive generalization. A hierarchical heuristic reward function was designed to guide robots to balance the multi-objective needs of approaching goals and avoiding obstacles in tasks. The simulation experiment results demonstrate that the improved algorithm outperforms in several aspects such as path length, planning time, and success rate, enhancing both efficiency and robustness in path planning.

Key words: reinforcement learning, path planning, SAC, hybrid dynamic sampling, hierarchical heuristic reward function

CLC Number: