动态环境下基于忆阻强化学习的移动机器人路径规划

doi:10.16182/j.issn1004731x.joss.22-0334

系统仿真学报 ›› 2023, Vol. 35 ›› Issue (7): 1619-1633.doi: 10.16182/j.issn1004731x.joss.22-0334

• 论文 • 上一篇

动态环境下基于忆阻强化学习的移动机器人路径规划

杨海兰¹(), 祁永强¹(), 吴保磊², 荣丹¹, 洪妙英¹, 王军³

^1.中国矿业大学数学学院, 江苏徐州 221116
^2.中国矿业大学, 计算机科学与技术学院, 江苏徐州 221116
^3.中国矿业大学信息与控制工程学院, 江苏徐州 221116

收稿日期:2022-04-11 修回日期:2022-07-07 出版日期:2023-07-29 发布日期:2023-07-19
通讯作者: 祁永强 E-mail:yhailan163@163.com;qiyongqiang@163.com
作者简介:杨海兰(1999-)，女，硕士生，研究方向为智能机器人控制。E-mail：yhailan163@163.com
基金资助:
国家自然科学基金(61304088);中央高校基本科研专项基金(2013QNA37);中国博士后科学基金(2015M581886);非结构化环境混合感知(2020ZDPY0217);中国矿业大学实验室开放基金(2020SYKF42);中国矿业大学未来杰出人才助力计划(2022WLJCRCZL134)

Path Planning of Mobile Robots Based on Memristor Reinforcement Learning in Dynamic Environment

Hailan Yang¹(), Yongqiang Qi¹(), Baolei Wu², Dan Rong¹, Miaoying Hong¹, Jun Wang³

^1.School of Mathematics, China University of Mining and Technology, Xuzhou 221116, China
^2.School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China
^3.School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China

Received:2022-04-11 Revised:2022-07-07 Online:2023-07-29 Published:2023-07-19
Contact: Yongqiang Qi E-mail:yhailan163@163.com;qiyongqiang@163.com

摘要/Abstract

摘要：

为解决动态环境下的移动机器人路径规划问题，提出基于改进蚁群算法和基于忆阻器阵列的DQN（deep q-network）算法的双层路径规划算法。通过改进了概率转移函数和信息素更新原则的蚁群算法完成静态全局路径规划；利用忆阻器“存算一体”的特性，将其作为神经网络的突触结构，改进了传统DQN算法结构，完成移动机器人的局部动态避障。根据移动机器人感知范围内是否有动态障碍物来切换路径规划机制，完成动态环境下的路径规划任务。仿真结果表明该算法有效可行，能在动态环境中为移动机器人实时规划出可行路径。

关键词: 动态环境, DQN(deep q-network), 忆阻器, 存算一体, 路径规划

Abstract:

In order to solve the path planning problem of mobile robots in dynamic environment, two-layer path planning algorithm based on improved ant colony algorithm and MA-DQN algorithm is proposed. Static global path planning is accomplished by ant colony algorithm that improved the probabilistic transfer function and the pheromone updating principle; the traditional DQN algorithm structure is improved by using the memristor as the synaptic structure of neural network, and then completed the local dynamic obstacle avoidance of the mobile robot. The path planning mechanism is switched according to whether there are dynamic obstacles within the sensing range of the mobile robot, so as to completed the path planning task in the dynamic environment. The simulation results show that the algorithm can effectively plan a feasible path for mobile robots in a dynamic environment in real time.

Key words: dynamic environment, (deep q-network)DQN, memristor, in-memory computing, path planning

中图分类号:

TP242

杨海兰, 祁永强, 吴保磊, 荣丹, 洪妙英, 王军. 动态环境下基于忆阻强化学习的移动机器人路径规划[J]. 系统仿真学报, 2023, 35(7): 1619-1633.

Hailan Yang, Yongqiang Qi, Baolei Wu, Dan Rong, Miaoying Hong, Jun Wang. Path Planning of Mobile Robots Based on Memristor Reinforcement Learning in Dynamic Environment[J]. Journal of System Simulation, 2023, 35(7): 1619-1633.

图/表 21

图1

图2

图3

图4

图5

图6

图7

图8

图9

图10

图11

表1

改进的蚁群算法参数

参数	数值
蚂蚁数目 $m$	50
最大迭代次数 $N m a x$	100
信息素启发因子 $α$	1
距离启发因子 $β$	7
拐弯启发因子 $κ$	1
挥发系数 $ρ$	0.5
信息素强度 $Q 1$	10

表1

图12

表2

忆阻器参数

参数	数值
最小阻值 $R o n / Ω$	100
最大阻值 $R o f f / Ω$	2 000
忆阻器初值 $x 0$	0.1
线性漂移系数 $μ v / (m 2 ⋅ s - 1 ⋅ V - 1)$	10^-14
忆阻器长度 $D / n m$	10
电压阈值 $V t h / V$	0.1
窗函数系数 $p$	5
频率 $f / H z$	6

表2

图13

图14

图15

图16

表3

MA-DQN参数

参数	数值
学习率 $α$	0.01
折扣因子 $γ$	0.9
探索因子 $ε$	0.1
训练次数episode	1000

表3

图17

图18

参考文献 24

1	林彬, 韩光辉, 宋晨晨, 等. 基于辐射扫描算法的机器人路径规划与仿真[J]. 系统仿真学报, 2021, 33(1): 84-90.
	Lin Bin, Han Guanghui, Song Chenchen, et al. Traversal Path Planning and Simulation of Robot Based on Radiation Scanning[J]. Journal of System Simulation, 2021, 33(1): 84-90.
2	朱大奇, 朱婷婷, 颜明重. 基于改进神经网络的多AUV全覆盖路径规划[J]. 系统仿真学报, 2020, 32(8): 1505-1514.
	Zhu Daqi, Zhu Tingting, Yan Mingzhong. Multi-AUV Complete Coverage Path Planning Based on Improved Neural Network[J]. Journal of System Simulation, 2020, 32(8): 1505-1514.
3	白天翔, 王帅, 沈震, 等. 平行机器人与平行无人系统:框架、结构、过程、平台及其应用[J]. 自动化学报, 2017, 43(2): 161-175.
	Bai Tianxiang, Wang Shuai, Shen Zhen, et al. Parallel Robotics and Parallel Unmanned Systems: Framework, Structure, Process, Platform and Applications[J]. Acta Automatica Sinica, 2017, 43(2): 161-175.
4	吴鹏, 桑成军, 陆忠华, 等. 基于改进A^*算法的移动机器人路径规划研究[J]. 计算机工程与应用, 2019, 55(21): 226-232, 269.
	Wu Peng, Sang Chengjun, Lu Zhonghua, et al. Research on Mobile Robot Path Planning Based on Improved A^* Algorithm[J]. Computer Engineering and Applications, 2019, 55(21): 226-232, 269.
5	陈继清, 谭成志, 莫荣现, 等. 基于人工势场的A^*算法的移动机器人路径规划[J]. 计算机科学, 2021, 48(11): 327-333.
	Chen Jiqing, Tan Chengzhi, Mo Rongxian, et al. Path Planning of Mobile Robot with A^* Algorithm Based on Artificial Potential Field[J]. Computer Science, 2021, 48(11): 327-333.
6	阮晓钢, 周静, 张晶晶, 等. 基于子目标搜索的机器人目标导向RRT路径规划算法[J]. 控制与决策, 2020, 35(10): 2543-2548.
	Ruan Xiaogang, Zhou Jing, Zhang Jingjing, et al. Robot Goal Guide RRT Path Planning Based on Sub-target Search[J]. Control and Decision, 2020, 35(10): 2543-2548.
7	张毅, 李奎, 黄超. 基于改进蚁群算法的二维码移动机器人路径规划方法[J]. 重庆邮电大学学报(自然科学版), 2021, 33(3): 491-497.
	Zhang Yi, Li Kui, Huang Chao. Path Planning Method for Two-dimensional Code Mobile Robot Based on Improved Ant Colony Algorithm[J]. Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition), 2021, 33(3): 491-497.
8	Xin Junfeng, Zhong Jiabao, Yang Fengru, et al. An Improved Genetic Algorithm for Path-planning of Unmanned Surface Vehicle[J]. Sensors, 2019, 19(11): 2640.
9	王霄汉, 张霖, 任磊, 等. 基于强化学习的车间调度问题研究简述[J]. 系统仿真学报, 2021, 33(12): 2782-2791.
	Wang Xiaohan, Zhang Lin, Ren Lei, et al. Brief Review on Applying Reinforcement Learning to Job Shop Scheduling Problems[J]. Journal of System Simulation, 2021, 33(12): 2782-2791.
10	Gao Junli, Ye Weijie, Guo Jing, et al. Deep Reinforcement Learning for Indoor Mobile Robot Path Planning[J]. Sensors, 2020, 20(19): 5493.
11	王毅然, 经小川, 田涛, 等. 基于强化学习的多Agent路径规划方法研究[J]. 计算机应用与软件, 2019, 36(8): 165-171.
	Wang Yiran, Jing Xiaochuan, Tian Tao, et al. Multi-agent Path Planning Based on Reinforcement Learning[J]. Computer Applications and Software, 2019, 36(8): 165-171.
12	Watkins C J C H, Dayan P. Q-learning[J]. Machine Learning, 1992, 8(3): 279-292.
13	Mao Chao, Shen Zuojun. A Reinforcement Learning Framework for the Adaptive Routing Problem in Stochastic Time-dependent Network[J]. Transportation Research Part C: Emerging Technologies, 2018, 93: 179-197.
14	Mnih V, Kavukcuoglu K, Silver D, et al. Human-level Control Through Deep Reinforcement Learning[J]. Nature, 2015, 518(7540): 529-533.
15	Chua L. Memristor-the Missing Circuit Element[J]. IEEE Transactions on Circuit Theory, 1971, 18(5): 507-519.
16	Li Can, Belkin D, Li Yunning, et al. Efficient and Self-adaptive in-situ Learning in Multilayer Memristor Neural Networks[J]. Nature Communications, 2018, 9(1): 2385.
17	Wang Zhongrui, Li Can, Song Wenhao, et al. Reinforcement Learning With Analogue Memristor Arrays[J]. Nature Electronics, 2019, 2(3): 115-124.
18	张耀中, 胡小方, 周跃, 等. 基于多层忆阻脉冲神经网络的强化学习及应用[J]. 自动化学报, 2019, 45(8): 1536-1547.
	Zhang Yaozhong, Hu Xiaofang, Zhou Yue, et al. A Novel Reinforcement Learning Algorithm Based on Multilayer Memristive Spiking Neural Network With Applications[J]. Acta Automatica Sinica, 2019, 45(8): 1536-1547.
19	Strukov D B, Snider G S, Stewart D R, et al. The Missing Memristor Found[J]. Nature, 2008, 453(7191): 80-83.
20	Joglekar Y N, Wolf S J. The Elusive Memristor: Properties of Basic Electrical Circuits[J]. European Journal of Physics, 2009, 30(4): 661.
21	段书凯, 胡小方, 王丽丹, 等. 忆阻器阻变随机存取存储器及其在信息存储中的应用[J]. 中国科学(信息科学), 2012, 42(6): 754-769.
	Duan Shukai, Hu Xiaofang, Wang Lidan, et al. Memristor-based RRAM With Applications[J]. Scientia Sinica(Informationis), 2012, 42(6): 754-769.
22	王雷, 石鑫. 基于改进蚁群算法的移动机器人动态路径规划[J]. 南京理工大学学报, 2019, 43(6): 700-707.
	Wang Lei, Shi Xin. Dynamic Path Planning of Mobile Robot Based on Improved Ant Colony Algorithm[J]. Journal of Nanjing University of Science and Technology, 2019, 43(6): 700-707.
23	Wang Binyu, Liu Zhe, Li Qingbiao, et al. Mobile Robot Path Planning in Dynamic Environments Through Globally Guided Reinforcement Learning[J]. IEEE Robotics and Automation Letters, 2020, 5(4): 6892-6939.
24	胡飞, 尤志强, 刘鹏, 等. 基于忆阻器交叉阵列的卷积神经网络电路设计[J]. 计算机研究与发展, 2018, 55(5): 1097-1107.
	Hu Fei, You Zhiqiang, Liu Peng, et al. Circuit Design of Convolutional Neural Network Based on Memristor Crossbar Arrays[J]. Journal of Computer Research and Development, 2018, 55(5): 1097-1107.

[1]	宋大雷, 干文浩, 许嘤枝, 曲秀青, 曹江丽. 无人船实时路径规划与编队控制仿真研究[J]. 系统仿真学报, 2023, 35(5): 957-970.
[2]	吴玉文, 牛智越, 李珍萍. 基于改进遗传算法的货箱机器人拣选路径规划[J]. 系统仿真学报, 2023, 35(5): 1086-1097.
[3]	薛均晓, 孔祥燕, 董博威, 陶浩, 管海洋, 石磊, 徐明亮. 航母甲板上舰载机的混合避障和仿真[J]. 系统仿真学报, 2023, 35(3): 592-603.
[4]	张国辉, 王璇, 张雅楠, 高昂. 实际环境中多无人车协同路径规划模型研究[J]. 系统仿真学报, 2023, 35(2): 408-422.
[5]	李腾, 丁佩佩, 刘金芳. 货到人拣选系统多阶段可穿行多AGV路径规划[J]. 系统仿真学报, 2022, 34(7): 1512-1523.
[6]	陆淼嘉, 黄承媛, 滕靖. 基于多智能体的网购生鲜无人车配送调度仿真[J]. 系统仿真学报, 2022, 34(6): 1185-1195.
[7]	张森, 张孟炎, 邵敬平, 普杰信. 基于随机策略搜索的多机三维路径规划方法[J]. 系统仿真学报, 2022, 34(6): 1286-1295.
[8]	蒙盾, 胡卓, 张华军. 基于改进A^*算法的多层邮轮疏散系统仿真[J]. 系统仿真学报, 2022, 34(6): 1375-1382.
[9]	梁江涛, 王慧琴. 基于改进蚁群算法的建筑火灾疏散路径规划研究[J]. 系统仿真学报, 2022, 34(5): 1044-1053.
[10]	邓向阳, 张立民, 方伟, 汤淼. 基于双向汇聚引导蚁群算法的机器人路径规划[J]. 系统仿真学报, 2022, 34(5): 1101-1108.
[11]	李兆强, 张时雨. 基于快速RRT算法的三维路径规划算法研究[J]. 系统仿真学报, 2022, 34(3): 503-511.
[12]	王启明, 宗高强, 胥津铭. 多段式自动泊车最优路径规划与仿真分析[J]. 系统仿真学报, 2022, 34(2): 385-395.
[13]	乔乔, 王艳, 纪志成. 基于冲突搜索算法的多机器人路径规划[J]. 系统仿真学报, 2022, 34(12): 2659-2669.
[14]	张莉, 张惠珍, 刘冬, 陆雨欣. 考虑紧迫度的应急物资调度及粒子群算法求解[J]. 系统仿真学报, 2022, 34(09): 1988-1998.
[15]	陈志梅, 李敏, 邵雪卷, 赵志诚. 基于改进RRT算法的桥式起重机避障路径规划[J]. 系统仿真学报, 2021, 33(8): 1832-1838.

动态环境下基于忆阻强化学习的移动机器人路径规划

Path Planning of Mobile Robots Based on Memristor Reinforcement Learning in Dynamic Environment

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 21

参考文献 24

相关文章 15

编辑推荐

Metrics

本文评价