系统仿真学报 ›› 2025, Vol. 37 ›› Issue (9): 2420-2430.doi: 10.16182/j.issn1004731x.joss.24-0369

• 论文 • 上一篇    下一篇

基于改进A-DDQN算法的机器人路径规划

倪培龙, 毛鹏军, 王宁, 杨孟杰   

  1. 河南科技大学 机电工程学院,河南 洛阳 471003
  • 收稿日期:2024-04-09 修回日期:2024-04-22 出版日期:2025-09-18 发布日期:2025-10-24
  • 通讯作者: 毛鹏军
  • 第一作者简介:

    倪培龙(2000-),男,硕士生,研究方向为机器人路径规划、强化学习。

  • 基金资助:
    洛阳市科技重大专项(2101018A)

Robot Path Planning Based on Improved A-DDQN Algorithm

Ni Peilong, Mao Pengjun, Wang Ning, Yang Mengjie   

  1. School of Mechanical and Electrical Engineering, Henan University of Science and Technology, Luoyang 471003, China
  • Received:2024-04-09 Revised:2024-04-22 Online:2025-09-18 Published:2025-10-24
  • Contact: Mao Pengjun

摘要:

针对传统DQN算法在机器人路径规划过程中存在奖励稀疏、无法区分样本重要性等问题,提出一种A-DDQN改进算法。在原有DQN的基础上,利用Double-DQN在目标网络的基础上做出改进,通过最大化预测Q值网络所给出的估值,并采用Q网络选出的动作去更新这个预测Q值网络,而非直接利用预测的Q值来选择行动,降低了最大化造成的高估问题;引入APF (artificial potential field)的思想,在机器人运动过程中,针对每一步设计相应的奖励,来引导机器人移动,改善奖励稀疏的问题;加入PER(prioritized experience replay)机制,通过对经验进行优先级排序来调整样本的采样概率,从而加速学习过程并提高性能。通过在二维栅格地图下对算法改进前后的路径规划效果进行了对比分析,仿真结果表明:改进A-DDQN算法的路径长度、迭代次数、拐点数量在小范围地图中分别减少11.5%、23.1%、61.5%;在大范围地图障碍物稀疏环境中分别减少19.4%、50.0%、52.9%;在大范围地图障碍物密集环境中分别减少29.7%、48.1%、64.3%,证明了改进算法能够实现更快的收敛速度和更优的路径规划效果。

关键词: 机器人, 路径规划, 深度强化学习, 人工势场, 优先经验回放

Abstract:

An improved A-DDQN algorithm is proposed to address the challenges of reward sparsity and the inability to distinguish sample importance in traditional DQN algorithms during robot path planning. Building on the original DQN, an enhancement is made by incorporating the Double-DQN approach, which updates the predictive Q-value network based on actions selected by the Q network, rather than directly using the predicted Q-values for action selection, thereby mitigating overestimation issues. Secondly, the concept of artificial potential field (APF) is introduced to design specific rewards for each step of the robot's movement, guiding the robot and addressing the problem of sparse rewards. Lastly, the prioritized experience replay (PER) mechanism is integrated, adjusting the sampling probability of experiences through priority ranking to accelerate the learning process and enhance performance. Comparative analysis of path planning before and after the algorithm improvements in a two-dimensional grid map shows that in small-scale maps, the improved A-DDQN algorithm reduces path length, iteration times, and the number of turning points by 11.5%, 23.1%, and 61.5% respectively; in large-scale maps with sparse obstacles, these reductions are 19.4%, 50.0%, and 52.9% respectively; and in large-scale maps with dense obstacles, reductions are 29.7%, 48.1%, and 64.3%. These simulation results prove that the improved algorithm achieves faster convergence and superior path planning performance.

Key words: robots, path planning, deep reinforcement learning, artificial potential field(APF), priority experience replay(PER)

中图分类号: