基于深度强化学习的动态库存路径优化

doi:10.16182/j.issn1004731x.joss.18-0820

系统仿真学报 ›› 2019, Vol. 31 ›› Issue (10): 2155-2163.doi: 10.16182/j.issn1004731x.joss.18-0820

基于深度强化学习的动态库存路径优化

周建频¹, 张姝柳²

1. 集美大学航海学院,厦门 361021;
2. 国网吉林供电公司,吉林 132000

收稿日期:2018-12-10 修回日期:2019-02-15 出版日期:2019-10-10 发布日期:2019-12-12
作者简介:周建频(1968-),男,福建,博士,副教授,研究方向为人工智能与供应链系统仿真; 张姝柳(1989-),女,吉林,硕士,助理工程师,研究方向为电气工程与项目管理。
基金资助:
福建省自然科学基金(2017J01797,2017J01796)

Dynamic Inventory Routing Optimization Based on Deep Reinforcement Learning

Zhou Jianpin¹, Zhang Shuliu²

1. School of Navigation, Jimei University, Xiamen 361021, China;
2. Jilin Power Supply Company of State Grid, Jilin 132000, China

Received:2018-12-10 Revised:2019-02-15 Online:2019-10-10 Published:2019-12-12

摘要/Abstract

摘要： 针对具有周期性波动需求的动态随机库存路径问题,提出了基于深度强化学习进行仿真优化并实现周期平稳策略的新方法。所研究问题构建动态组合优化模型,通过深度强化学习和设置启发规则来综合决定每个时期的补货节点集合和补货批量分配权重。仿真实验结果表明,与现有文献中的两种方法相比,所提出的方法在较低波动需求情况下可分别提高一个周期的平均利润约2.7%和3.9%,在较高波动需求情况下提高约8.2%和7.1%,而周期服务水平在不同需求波动环境下都可以平稳地保持在一个较小的波动范围内。

关键词: 库存路径问题, 启发规则, 深度Q-学习, 动态, 周期平稳策略

Abstract: Aiming at the dynamic stochastic inventory routing problem with periodic fluctuation of demand, a novel simulation optimization approach based on deep reinforcement learning is proposed to achieving periodic steady strategy. Firstly a dynamic combinatorial optimization model is constructed. Then, by deep reinforcement learning and setting heuristic rules, the replenishment nodes set selection and the replenishment batch allocation weights in each period are determined. The simulation experimental results show that the proposed method can improve the average profit of a cycle by about 2.7% and 3.9% in low fluctuating demand case and by about 8.2% and 7.1% in high fluctuating demand case compared with the two solution methods in the existing literature, and the cycle service level can be stabilized within a small fluctuation range under different demand fluctuation environments.

Key words: inventory routing problem, heuristic rules, deep Q-learning, dynamics, periodic steady strategy

中图分类号:

TP391.9

周建频, 张姝柳. 基于深度强化学习的动态库存路径优化[J]. 系统仿真学报, 2019, 31(10): 2155-2163.

Zhou Jianpin, Zhang Shuliu. Dynamic Inventory Routing Optimization Based on Deep Reinforcement Learning[J]. Journal of System Simulation, 2019, 31(10): 2155-2163.

参考文献

[1] Andersson H, Hoff A, Christiansen M, et al.Industrial aspects and literature survey: Combined inventory management and routing[J]. Computers & Operations Research (S0305-0548), 2010, 37(9): 1515-1536.
[2] Abdelmaguid T F, Dessouky M M, Ordonez F.Heuristic approaches for the inventory-routing problem with backlogging[J]. Computers & Industrial Engineering (S0360-8352), 2009, 56(4): 1519-1534.
[3] Chen L J, Chiang WC, Russell R, et al.The probabilistic vehicle routing problem with service guarantees[J]. Transportation Research Part E (S1366-5545), 2018, 111(3): 149-164.
[4] 林峰, 贾涛, 李然. 基于改进C-W算法的易腐品一体化库存路径问题研究[J]. 系统工程, 2016, 34(8): 100-107.
Lin Feng, Jia Tao, Li Ran.Integrated inventory routing problem with vehicle multi-tours for deteriorating item based on modified C-W saving algorithm[J]. Systems Engineering, 2016, 34(8): 100-107.
[5] Chitsaz M, Divsalar A, Vansteenwegen P.A two-phase algorithm for the cyclic inventory routing problem[J]. European Journal of Operational Research (S0377-2217), 2016, 254(2): 410-426.
[6] Juan A A, Grasman S E, Caceres-Cruz J, et al.A simheuristic algorithm for the Single-Period Stochastic Inventory-Routing Problem with stock-outs[J]. Simulation Modelling Practice and Theory (S1569-190X), 2014, 46(8): 40-52.
[7] Rabbani M, Heidari R, Yazdanparast R.A stochastic multi-period industrial hazardous waste location-routing problem: Integrating NSGA-II and Monte Carlo simulation[J]. European Journal of Operational Research (S0377-2217), 2019, 272(3): 945-961.
[8] Gruler A, Panadero J, De Armas J, et al.Combining variable neighborhood search with simulation for the inventory routing problem with stochastic demands and stock-outs[J]. Computers&Industrial Engineering (S0360-8352), 2018, 123(9): 278-288.
[9] 赵达, 李军, 马丹祥, 等. 直接配送下随机需求库存–路径问题最优平稳策略及其算法[J]. 中国管理科学, 2014, 22(6): 61-68.
Zhao Da, Li Jun, Ma Danxiang, et al.Computing the optimal stationary strategy of stochastic demand inventory routing problem with direct deliveries[J]. Chinese Journal of Management Science, 2014, 22(6): 61-68.
[10] Roldan R F, Basagoiti R, Coelho L C.Robustness of inventory replenishment and customer selection policies for the dynamic and stochastic inventory-routing problem[J]. Computers &Operations Research (S0305-0548), 2016, 74(10): 14-20.
[11] Sayarshad H R, Gao H O.A non-myopic dynamic inventory routing and pricing problem[J]. Transportation Research Part E (S1366-5545), 2018, 109(1): 83-98.
[12] Brinkmann J, Ulmer M W, Mattfeld D C.Dynamic Lookahead Policies for Stochastic-Dynamic Inventory Routing in Bike Sharing Systems[J]. Computers and Operations Research (S0305-0548), 2019, 106(6): 260-279.
[13] Mnih V, Kavukcuoglu K, Silver D, et al.Human-level control through deep reinforcement learning[J]. Nature (S1476-4687), 2015, 518(2): 529-533.
[14] Hessel M, Modayil J, Van Hasselt H, et al.Rainbow: combining improvements in deep reinforcement learning[C]. Proceedings of the AAAI Conference on Artiﬁcial Intelligence. Palo Alto, California USA: the AAAI Press, 2018.

[1]	赵也践, 王艳红, 张俊, 于洪霞, 田中大. 改进Q学习算法在作业车间调度问题中的应用[J]. 系统仿真学报, 2022, 34(6): 1247-1258.
[2]	周培培, 侯幸林. 一种用于图像融合的无监督深度神经网络[J]. 系统仿真学报, 2022, 34(6): 1267-1274.
[3]	倪娜, 何坤金, 朱新成, 陈正鸣. 一种人体肌肉的参数化建模及变形方法研究[J]. 系统仿真学报, 2022, 34(5): 1109-1117.
[4]	周鑫, 王维平, 朱一凡, 王涛, 井田. 基于人机协作的无人集群搜索方法研究[J]. 系统仿真学报, 2022, 34(4): 735-744.
[5]	高鑫宇, 倪静. 救援效率视角下灾后动态应急配送网络优化[J]. 系统仿真学报, 2022, 34(4): 806-816.
[6]	冯开团, 袁杰. 基于改进注水算法的离散车间任务分配问题研究[J]. 系统仿真学报, 2022, 34(4): 768-776.
[7]	胡剑鹏, 罗霞. 城市轨道交通动态客流分配仿真方法研究[J]. 系统仿真学报, 2022, 34(3): 512-526.
[8]	白燕, 武璐璐, 贺引娥, 王玉英. 基于动态温度调控的空调系统能耗预测[J]. 系统仿真学报, 2022, 34(2): 366-375.
[9]	刘浩, 毛宏霞, 肖志河, 刘铮. 基于动态时间规整的火箭尾喷焰识别仿真[J]. 系统仿真学报, 2022, 34(1): 126-133.
[10]	王泊涵, 吴婷钰, 李文浩, 黄达, 金博, 杨峰, 周爱民, 王祥丰. 基于多智能体强化学习的大规模无人机集群对抗[J]. 系统仿真学报, 2021, 33(8): 1739-1753.
[11]	路雪鹏, 尚娇, 赵俊辉, 吕露露, 周丽. 基于系统动力学的新冠病毒传播过程预测[J]. 系统仿真学报, 2021, 33(7): 1713-1721.
[12]	位晶晶, 刘勤明, 叶春明, 李冠林. 基于GA-SVR的小样本数据缺失下的设备故障诊断[J]. 系统仿真学报, 2021, 33(6): 1342-1349.
[13]	孙伟卿, 宋赫, 韩冬, 田坤鹏. 火电深度调峰改造近似动态规划方法[J]. 系统仿真学报, 2021, 33(6): 1415-1426.
[14]	高昂, 董志明, 张国辉, 梁涛, 郭齐胜. LVC训练系统中计算机生成兵力生成技术研究[J]. 系统仿真学报, 2021, 33(3): 745-752.
[15]	高昂, 王晓路, 董志明, 张国辉. 装甲兵虚拟兵力战术对抗动作参数矫正方法[J]. 系统仿真学报, 2021, 33(2): 484-493.

基于深度强化学习的动态库存路径优化

Dynamic Inventory Routing Optimization Based on Deep Reinforcement Learning

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价