基于改进A-DDQN算法的机器人路径规划

doi:10.16182/j.issn1004731x.joss.24-0369

摘要/Abstract

摘要：

针对传统DQN算法在机器人路径规划过程中存在奖励稀疏、无法区分样本重要性等问题，提出一种A-DDQN改进算法。在原有DQN的基础上，利用Double-DQN在目标网络的基础上做出改进，通过最大化预测Q值网络所给出的估值，并采用Q网络选出的动作去更新这个预测Q值网络，而非直接利用预测的Q值来选择行动，降低了最大化造成的高估问题；引入APF (artificial potential field)的思想，在机器人运动过程中，针对每一步设计相应的奖励，来引导机器人移动，改善奖励稀疏的问题；加入PER(prioritized experience replay)机制，通过对经验进行优先级排序来调整样本的采样概率，从而加速学习过程并提高性能。通过在二维栅格地图下对算法改进前后的路径规划效果进行了对比分析，仿真结果表明：改进A-DDQN算法的路径长度、迭代次数、拐点数量在小范围地图中分别减少11.5%、23.1%、61.5%；在大范围地图障碍物稀疏环境中分别减少19.4%、50.0%、52.9%；在大范围地图障碍物密集环境中分别减少29.7%、48.1%、64.3%，证明了改进算法能够实现更快的收敛速度和更优的路径规划效果。

关键词: 机器人, 路径规划, 深度强化学习, 人工势场, 优先经验回放

Abstract:

An improved A-DDQN algorithm is proposed to address the challenges of reward sparsity and the inability to distinguish sample importance in traditional DQN algorithms during robot path planning. Building on the original DQN, an enhancement is made by incorporating the Double-DQN approach, which updates the predictive Q-value network based on actions selected by the Q network, rather than directly using the predicted Q-values for action selection, thereby mitigating overestimation issues. Secondly, the concept of artificial potential field (APF) is introduced to design specific rewards for each step of the robot's movement, guiding the robot and addressing the problem of sparse rewards. Lastly, the prioritized experience replay (PER) mechanism is integrated, adjusting the sampling probability of experiences through priority ranking to accelerate the learning process and enhance performance. Comparative analysis of path planning before and after the algorithm improvements in a two-dimensional grid map shows that in small-scale maps, the improved A-DDQN algorithm reduces path length, iteration times, and the number of turning points by 11.5%, 23.1%, and 61.5% respectively; in large-scale maps with sparse obstacles, these reductions are 19.4%, 50.0%, and 52.9% respectively; and in large-scale maps with dense obstacles, reductions are 29.7%, 48.1%, and 64.3%. These simulation results prove that the improved algorithm achieves faster convergence and superior path planning performance.

Key words: robots, path planning, deep reinforcement learning, artificial potential field(APF), priority experience replay(PER)

中图分类号:

TP242

倪培龙,毛鹏军,王宁等 . 基于改进A-DDQN算法的机器人路径规划[J]. 系统仿真学报, 2025, 37(9): 2420-2430.

Ni Peilong,Mao Pengjun,Wang Ning,et al . Robot Path Planning Based on Improved A-DDQN Algorithm[J]. Journal of System Simulation, 2025, 37(9): 2420-2430.

图/表 22

图1

图2

图3

图4

图5

图6

表1

超参数设置

超参数	数值	超参数	数值
折扣率 $γ$	0.9	动作空间大小 $a$	8
学习率 $α$	0.1	迭代次数	200
初始探索率	1	引力奖励常数 $α 1$	0.000 01
最小探索率	0.1	排斥奖励常数 $α 2$	0.05
探索衰减值	0.998	方向奖励常数 $α 3$	0.05

表1

图7

图8

图9

图10

表2

图11

图12

图13

图14

表3

图15

图16

图17

图18

表4

参考文献 20

[1]	林桂娟, 李子涵, 王宇. 基于全局关键点提取的改进A^*算法全局路径规划研究[J]. 系统仿真学报, 2025, 37(3): 667-678.
	Lin Guijuan, Li Zihan, Wang Yu. Research on Improved A^* Algorithm Path Planning Based on Global Key Point Extraction[J]. Journal of System Simulation, 2025, 37(3): 667-678.
[2]	张瑞, 周丽, 刘正洋. 融合RRT^*与DWA算法的移动机器人动态路径规划[J]. 系统仿真学报, 2024, 36(4): 957-968.
	Zhang Rui, Zhou Li, Liu Zhengyang. Dynamic Path Planning for Mobile Robot Based on RRT^* and Dynamic Window Approach[J]. Journal of System Simulation, 2024, 36(4): 957-968.
[3]	Lin Zenan, Yue Ming, Chen Guangyi, et al. Path Planning of Mobile Robot with PSO-based APF and Fuzzy-based DWA Subject to Moving Obstacles[J]. Transactions of the Institute of Measurement and Control, 2022, 44(1): 121-132.
[4]	Chen Yanli, Bai Guiqiang, Zhan Yin, et al. Path Planning and Obstacle Avoiding of the USV Based on Improved ACO-APF Hybrid Algorithm with Adaptive Early-warning[J]. IEEE Access, 2021, 9: 40728-40742.
[5]	Cheng Jin, Wang Bin. Flocking Control of Mobile Robots with Obstacle Avoidance Based on Simulated Annealing Algorithm[J]. Mathematical Problems in Engineering, 2020, 2020(1): 7357464.
[6]	邓向阳, 张立民, 方伟, 等. 基于双向汇聚引导蚁群算法的机器人路径规划[J]. 系统仿真学报, 2022, 34(5): 1101-1108.
	Deng Xiangyang, Zhang Limin, Fang Wei, et al. Robot Path Planning Based on Bidirectional Aggregation Ant Colony Optimization[J]. Journal of System Simulation, 2022, 34(5): 1101-1108.
[7]	Tian Shasha, Li Yuanxiang, Kang Yilin, et al. Multi-robot Path Planning in Wireless Sensor Networks Based on Jump Mechanism PSO and Safety Gap Obstacle Avoidance[J]. Future Generation Computer Systems, 2021, 118: 37-47.
[8]	Chen Deping. Fuzzy Obstacle Avoidance Optimization of Soccer Robot Based on an Improved Genetic Algorithm[J]. Journal of Ambient Intelligence and Humanized Computing, 2020, 11(12): 6187-6198.
[9]	Kulathunga Geesara. A Reinforcement Learning Based Path Planning Approach in 3D Environment[J]. Procedia Computer Science, 2022, 212: 152-160.
[10]	Li Xin, Wang Lei, An Yi, et al. Dynamic Path Planning of Mobile Robots Using Adaptive Dynamic Programming[J]. Expert Systems with Applications, 2024, 235: 121112.
[11]	李子怡, 胡祥涛, 张勇乐, 等. 基于虚拟目标制导的自适应Q学习路径规划算法[J]. 计算机集成制造系统, 2024, 30(2): 553-568.
	Li Ziyi, Hu Xiangtao, Zhang Yongle, et al. Adaptive Q-learning Path Planning Algorithm Based on Virtual Target Guidance[J]. Computer Integrated Manufacturing Systems, 2024, 30(2): 553-568.
[12]	Daranda Andrius, Dzemyda Gintautas. Reinforcement Learning Strategies for Vessel Navigation[J]. Integrated Computer-Aided Engineering, 2023, 30(1): 53-66.
[13]	Mnih V, Kavukcuoglu K, Silver D, et al. Playing Atari with Deep Reinforcement Learning[EB/OL]. [2024-03-02].
[14]	Liangheng Lü, Zhang Sunjie, Ding Derui, et al. Path Planning via an Improved DQN-based Learning Policy[J]. IEEE Access, 2019, 7: 67319-67330.
[15]	张晨, 蒋文英, 陈思源, 等. 基于双层DQN的多智能体路径规划[J]. 中国图象图形学报, 2023, 28(7): 2167-2181.
	Zhang Chen, Jiang Wenying, Chen Siyuan, et al. Multi-agent Path Planning Based on Improved Double DQN[J]. Journal of Image and Graphics, 2023, 28(7): 2167-2181.
[16]	韩玲, 张晖, 方若愚, 等. 基于改进深度强化学习的全局路径规划策略[J]. 汽车安全与节能学报, 2023, 14(2): 202-211.
	Han Ling, Zhang Hui, Fang Ruoyu, et al. Global Path Planning Strategy Based on an Improved Deep Reinforcement Learning[J]. Journal of Automotive Safety and Energy, 2023, 14(2): 202-211.
[17]	李明, 叶汪忠, 燕洁华. 基于深度强化学习的沙漠机器人路径规划[J]. 系统仿真学报, 2024, 36(12): 2917-2925.
	Li Ming, Ye Wangzhong, Yan Jiehua. Path Planning of Desert Robot Based on Deep Reinforcement Learning[J]. Journal of System Simulation, 2024, 36(12): 2917-2925.
[18]	Li Jianxin, Chen Yiting, Zhao Xiuniao, et al. An Improved DQN Path Planning Algorithm[J]. The Journal of Supercomputing, 2022, 78(1): 616-639.
[19]	Kober Jens, Bagnell J A, Peters Jan. Reinforcement Learning in Robotics: A Survey[J]. International Journal of Robotics Research, 2013, 32(11): 1238-1274.
[20]	段建民, 陈强龙. 利用先验知识的Q-Learning路径规划算法研究[J]. 电光与控制, 2019, 26(9): 29-33.
	Duan Jianmin, Chen Qianglong. Prior Knowledge Based Q-learning Path Planning Algorithm[J]. Electronics Optics & Control, 2019, 26(9): 29-33.