基于SAC算法的移动机器人智能路径规划

doi:10.16182/j.issn1004731x.joss.22-0412

摘要/Abstract

关键词: 深度强化学习, 路径规划, SAC(soft actor-critic)算法, 连续奖励函数, 移动机器人

Key words: deep reinforcement learning, path planning, soft actor-critic algorithm, continuous reward functions, mobile robots

杨来义, 毕敬, 苑海涛. 基于SAC算法的移动机器人智能路径规划[J]. 系统仿真学报, 2023, 35(8): 1726-1736.

Laiyi Yang, Jing Bi, Haitao Yuan. Intelligent Path Planning for Mobile Robots Based on SAC Algorithm[J]. Journal of System Simulation, 2023, 35(8): 1726-1736.

图/表 11

图1

图2

表1

图3

图4

图5

图6

图7

图8

表2

图9

参考文献 35

1	朱大奇, 颜明重. 移动机器人路径规划技术综述[J]. 控制与决策, 2010, 25(7): 961-967.
	Zhu Daqi, Yan Mingzhong. Survey on Technology of Mobile Robot Path Planning[J]. Control and Decision, 2010, 25(7): 961-967.
2	黄晓冬, 苑海涛, 毕敬, 等. 基于DQN的海战场舰船路径规划及仿真[J]. 系统仿真学报, 2021, 33(10): 2440-2448.
	Huang Xiaodong, Yuan Haitao, Bi Jing, et al. DQN-Based Path Planning Method and Simulation for Submarine and Warship in Naval Battlefield[J]. Journal of System Simulation, 2021, 33(10): 2440-2448.
3	Gasparetto A, Boscariol P, Lanzutti A, et al. Path Planning and Trajectory Planning Algorithms: A General Overview[M]//Carbone G, Gomez-Bravo F. Motion and Operation Planning of Robotic Systems: Background and Practical Approaches. Cham: Springer International Publishing, 2015: 3-27.
4	张捍东, 郑睿, 岑豫皖. 移动机器人路径规划技术的现状与展望[J]. 系统仿真学报, 2005, 17(2): 439-443.
	Zhang Handong, Zheng Rui, Cen Yuwan. Present Situation and Future Development of Mobile Robot Path Planning Technology[J]. Journal of System Simulation, 2005, 17(2): 439-443.
5	Tavares R S, Martins T C, Tsuzuki M S G. Simulated Annealing With Adaptive Neighborhood: A Case Study in Off-line Robot Path Planning[J]. Expert Systems With Applications, 2011, 38(4): 2951-2965.
6	Liu Yuecheng, Zhao Yongjia. A Virtual-waypoint Based Artificial Potential Field Method for UAV Path Planning[C]//2016 IEEE Chinese Guidance, Navigation and Control Conference(CGNCC). Piscataway, NJ, USA: IEEE, 2016: 949-953.
7	Le A V, Prabakaran V, Sivanantham V, et al. Modified A-star Algorithm for Efficient Coverage Path Planning in Tetris Inspired Self-reconfigurable Robot With Integrated Laser Sensor[J]. Sensors, 2018, 18(8): 2585.
8	邓学强. 基于改进人工势场法的移动机器人路径规划[J]. 山东理工大学学报(自然科学版), 2014, 28(1): 38-41.
	Deng Xueqiang. Path Planning of Mobile Robot Based on Modified Artificial Potential Field Method[J]. Journal of Shandong University of Technology(Natural Science Edition), 2014, 28(1): 38-41.
9	Rashid R, Perumal N, Elamvazuthi I, et al. Mobile Robot Path Planning Using Ant Colony Optimization[C]//2016 2nd IEEE International Symposium on Robotics and Manufacturing Automation (ROMA). Piscataway, NJ, USA: IEEE, 2016: 1-6.
10	Roberge V, Tarbouchi M, Labonté Gilles. Comparison of Parallel Genetic Algorithm and Particle Swarm Optimization for Real-time UAV Path Planning[J]. IEEE Transactions on Industrial Informatics, 2013, 9(1): 132-141.
11	乔俊飞, 侯占军, 阮晓钢. 基于神经网络的强化学习在避障中的应用[J]. 清华大学学报(自然科学版), 2008, 48(增2): 1747-1750.
	Qiao Junfei, Hou Zhanjun, Ruan Xiaogang. Neural Network-based Reinforcement Learning Applied to Obstacle Avoidance[J]. Journal of Tsinghua University(Science and Technology), 2008, 48(S2): 1747-1750.
12	刘建伟, 高峰, 罗雄麟. 基于值函数和策略梯度的深度强化学习综述[J]. 计算机学报, 2019, 42(6): 1406-1438.
	Liu Jianwei, Gao Feng, Luo Xionglin. Survey of Deep Reinforcement Learning Based on Value Function and Policy Gradient[J]. Chinese Journal of Computers, 2019, 42(6): 1406-1438.
13	Low E S, Ong P, Cheah K C. Solving the Optimal Path Planning of a Mobile Robot Using Improved Q-learning[J]. Robotics and Autonomous Systems, 2019, 115: 143-161.
14	Harwin L, Supriya P. Comparison of SARSA Algorithm and Temporal Difference Learning Algorithm for Robotic Path Planning for Static Obstacles[C]//2019 Third International Conference on Inventive Systems and Control(ICISC). Piscataway, NJ, USA: IEEE, 2019: 472-476.
15	Li Jianxi, Chen Yiting, Zhao Xiuniao, et al. An Improved DQN Path Planning Algorithm[J]. The Journal of Supercomputing, 2022, 78(1): 616-639.
16	陶重犇, 雷祝兵, 李春光, 等. 基于改进模拟退火算法的搬运机器人路径规划[J]. 计算机测量与控制, 2018, 26(7): 182-185.
	Tao Zhongben, Lei Zhubing, Li Chunguang, et al. Path Planning of Handling Robot Based on Improved Simulated Annealing Algorithm[J]. Computer Measurement & Control, 2018, 26(7): 182-185.
17	Liu Zhixian, Yuan Xiaofang, Huang Guoming, et al. Two Potential Fields Fused Adaptive Path Planning System for Autonomous Vehicle Under Different Velocities[J]. ISA Transactions, 2021, 112: 176-185.
18	蒙盾, 胡卓, 张华军. 基于改进A^*算法的多层邮轮疏散系统仿真[J]. 系统仿真学报, 2022, 34(6): 1375-1382.
	Meng Dun, Hu Zhuo, Zhang Huajun. Simulation of Multi-layer Ship Evacuation System Based on Improved A^* Algorithm[J]. Journal of System Simulation, 2022, 34(6): 1375-1382.
19	徐力, 刘云华, 王启富. 自适应遗传算法在机器人路径规划的应用[J]. 计算机工程与应用, 2020, 56(18): 36-41.
	Xu Li, Liu Yunhua, Wang Qifu. Application of Adaptive Genetic Algorithm in Robot Path Planning[J]. Computer Engineering and Applications, 2020, 56(18): 36-41.
20	Qu Hong, Xing Ke, Alexander T. An Improved Genetic Algorithm With Co-evolutionary Strategy for Global Path Planning of Multiple Mobile Robots[J]. Neurocomputing, 2013, 120: 509-517.
21	Liu Yiyang, Hou Zheng, Tan Yuanyuan, et al. Research on Multi-AGVs Path Planning and Coordination Mechanism[J]. IEEE Access, 2020, 8: 213345-213356.
22	Ajeil F H, Ibraheem I K, Azar A T, et al. Grid-based Mobile Robot Path Planning Using Aging-based Ant Colony Optimization Algorithm in Static and Dynamic Environments[J]. Sensors, 2020, 20(7): 1880.
23	罗阳阳, 彭晓燕. 基于改进PSO的四轮移动机器人全局路径规划[J]. 计算机仿真, 2020, 37(7): 373-379.
	Luo Yangyang, Peng Xiaoyan. Global Path Planning of Four-wheel Mobile Robot Based on Improved PSO[J]. Computer Simulation, 2020, 37(7): 373-379.
24	Zou Qijie, Zhang Yue, Liu Shihui. A Path Planning Algorithm Based on RRT and SARSA (λ) in Unknown and Complex Conditions[C]//2020 Chinese Control And Decision Conference(CCDC). Piscataway, NJ, USA: IEEE, 2020: 2035-2040.
25	Liao Xiaofei, Wang Yang, Xuan Yiliang, et al. AGV Path Planning Model Based on Reinforcement Learning[C]//2020 Chinese Automation Congress(CAC). Piscataway, NJ, USA: IEEE, 2020: 6722-6726.
26	Meerza S I A, Islam M, Uzzal M M. Q-learning Based Particle Swarm Optimization Algorithm for Optimal Path Planning of Swarm of Mobile Robots[C]//2019 1st International Conference on Advances in Science, Engineering and Robotics Technology(ICASERT). Piscataway, NJ, USA: IEEE, 2019: 1-5.
27	Liu Zhiyong, Lan Fei, Yang Haibo. Partition Heuristic RRT Algorithm of Path Planning Based on Q-learning[C]//2019 IEEE 4th Advanced Information Technology, Electronic and Automation Control Conference(IAEAC). Piscataway, NJ, USA: IEEE, 2019: 386-392.
28	Yao Qingfeng, Zheng Zeyu, Qi Liang, et al. Path Planning Method With Improved Artificial Potential Field-a Reinforcement Learning Perspective[J]. IEEE Access, 2020, 8: 135513-135523.
29	Zhou Siyu, Liu Xin, Xu Yingfu, et al. A Deep Q-network (DQN) Based Path Planning Method for Mobile Robots[C]//2018 IEEE International Conference on Information and Automation(ICIA). Piscataway, NJ, USA: IEEE, 2018: 366-371.
30	Dong Yuansheng, Zou Xingjie. Mobile Robot Path Planning Based on Improved DDPG Reinforcement Learning Algorithm[C]//2020 IEEE 11th International Conference on Software Engineering and Service Science(ICSESS). Piscataway, NJ, USA: IEEE, 2020: 52-56.
31	Park S G, Kim D H. Autonomous Flying of Drone Based on PPO Reinforcement Learning Algorithm[J]. Journal of Institute of Control, Robotics and Systems, 2020, 26(11): 955-963.
32	Haarnoja T, Zhou A, Hartikainen K, et al. Soft Actor-critic Algorithms and Applications[EB/OL]. (2019-01-29) [2021-06-13].
33	Ng A Y, Harada D, Russell S J. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping[C]//Proceedings of the Sixteenth International Conference on Machine Learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1999: 278-287.
34	Laud A D. Theory and Application of Reward Shaping in Reinforcement Learning[D]. Urbana, IL, USA: University of Illinois at Urbana-Champaign, 2004.
35	何柳柳, 杨羊, 李征, 等. 面向持续集成测试优化的强化学习奖励机制[J]. 软件学报, 2019, 30(5): 1438-1449.
	He Liuliu, Yang Yang, Li Zheng, et al. Reward of Reinforcement Learning of Test Optimization for Continuous Integration[J]. Journal of Software, 2019, 30(5): 1438-1449.

参数	取值
折扣因子	0.95
学习率	0.000 3
训练批次大小	256
每次迭代的步长	1 000
经验池大小	10⁷
温度系数	0.005
α初始值	1
梯度步数	1
探索函数	随机采样

指标	SAC	PPO
到达率/%	94	71
平均步数	342.63	301.29
平均路径长度/m	726.38	1072.93

[1]	林俊强, 王红军, 邹湘军, 张坡, 李承恩, 周益鹏, 姚书杰. 基于DPPO的移动采摘机器人避障路径规划及仿真[J]. 系统仿真学报, 2023, 35(8): 1692-1704.
[2]	刘家义, 王刚, 付强, 郭相科, 王思远. 基于分配策略优化算法的智能防空任务分配[J]. 系统仿真学报, 2023, 35(8): 1705-1716.
[3]	曹梦龙, 赵文彬, 陈志强. 融合粒子群算法与改进灰狼算法的机器人路径规划[J]. 系统仿真学报, 2023, 35(8): 1768-1775.
[4]	杨海兰, 祁永强, 吴保磊, 荣丹, 洪妙英, 王军. 动态环境下基于忆阻强化学习的移动机器人路径规划[J]. 系统仿真学报, 2023, 35(7): 1619-1633.
[5]	丁飞, 沙宇晨, 洪莹, 蒯晓, 张登银. 智能网联汽车计算卸载与边缘缓存联合优化策略[J]. 系统仿真学报, 2023, 35(6): 1203-1214.
[6]	陈奕梅, 石小凡, 李宝全. 基于改进虚拟弹簧模型的多机器人编队控制[J]. 系统仿真学报, 2023, 35(6): 1235-1244.
[7]	宋大雷, 干文浩, 许嘤枝, 曲秀青, 曹江丽. 无人船实时路径规划与编队控制仿真研究[J]. 系统仿真学报, 2023, 35(5): 957-970.
[8]	吴玉文, 牛智越, 李珍萍. 基于改进遗传算法的货箱机器人拣选路径规划[J]. 系统仿真学报, 2023, 35(5): 1086-1097.
[9]	戴宇轩, 崔承刚. 基于深度强化学习的Boost变换器控制策略[J]. 系统仿真学报, 2023, 35(5): 1109-1119.
[10]	徐浩添, 秦龙, 曾俊杰, 胡越, 张琪. 基于深度强化学习的对手建模方法研究综述[J]. 系统仿真学报, 2023, 35(4): 671-694.
[11]	薛均晓, 孔祥燕, 董博威, 陶浩, 管海洋, 石磊, 徐明亮. 航母甲板上舰载机的混合避障和仿真[J]. 系统仿真学报, 2023, 35(3): 592-603.
[12]	史佳洁, 杨鹏, 皮雁南. 基于机器学习的地铁行人流在线优化控制研究[J]. 系统仿真学报, 2023, 35(2): 386-395.
[13]	张国辉, 王璇, 张雅楠, 高昂. 实际环境中多无人车协同路径规划模型研究[J]. 系统仿真学报, 2023, 35(2): 408-422.
[14]	李腾, 丁佩佩, 刘金芳. 货到人拣选系统多阶段可穿行多AGV路径规划[J]. 系统仿真学报, 2022, 34(7): 1512-1523.
[15]	陆淼嘉, 黄承媛, 滕靖. 基于多智能体的网购生鲜无人车配送调度仿真[J]. 系统仿真学报, 2022, 34(6): 1185-1195.

基于SAC算法的移动机器人智能路径规划

Intelligent Path Planning for Mobile Robots Based on SAC Algorithm

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 11

参考文献 35

相关文章 15

编辑推荐

Metrics

本文评价