基于改进D3QN的煤炭码头卸车排产智能优化方法

doi:10.16182/j.issn1004731x.joss.22-1320

摘要/Abstract

摘要：

采用智能化决策排产能够提高大型港口的运营效率，是人工智能技术在智慧港口场景落地的重要研究方向之一。针对煤炭码头卸车智能排产任务，将其抽象为马尔可夫序列决策问题。建立了该问题的深度强化学习模型，并针对该模型中动作空间维度高且可行动作稀疏的特点，提出一种改进的D3QN算法，实现了卸车排产调度决策的智能优化。仿真结果表明，对于同一组随机任务序列，优化后的排产策略相比随机策略实现了明显的效率提升。同时，将训练好的排产策略应用于随机生成的新任务序列，可实现5%~7%的排产效率提升，表明该优化方法具有较好的泛化能力。此外，随着决策模型复杂度的提升，传统启发式优化算法面临建模困难、求解效率低等突出问题。所提算法为该类问题的研究提供了一种新思路，有望实现深度强化学习智能决策在港口排产任务中的更广泛应用。

关键词: 码头卸车排产, 调度策略优化, 智能决策, 深度强化学习, Dueling Double DQN算法

Abstract:

Intelligent decision scheduling can improve the operation efficiency of large ports, which is one of the important research directions for the implementation of artificial intelligence technology in the smart port scenario. This article studies the intelligent unloading scheduling tasks of coal terminals and abstracts them as a Markov sequence decision problem. A deep reinforcement learning model for this problem is established, and an improved D3QN algorithm is proposed to realize intelligent optimization of unloading scheduling decisions by considering the characteristics of high action space dimension and sparse feasible action in the model. The simulation results show that for the same set of random task sequences, the optimized scheduling strategy obviously improves the efficiency compared with the random strategy. At the same time, the trained scheduling strategy is directly applied to the randomly generated new task sequence, and the scheduling efficiency is improved by 5%~7%, which indicates that the optimization method has good generalization ability. In addition, with the increasing complexity of decision models, traditional heuristic optimization algorithms are faced with prominent problems such as difficult modeling and low solving efficiency. This article provides a new idea for studying this kind of problem, which is expected to realize the wider application of deep reinforcement learning-based intelligent decision-making in port scheduling tasks.

Key words: terminal unloading scheduling, scheduling strategy optimization, intelligent decision-making, deep reinforcement learning, Dueling Double DQN algorithm

中图分类号:

TP391.9

秦保新,张羽霄,吴思锐等 . 基于改进D3QN的煤炭码头卸车排产智能优化方法[J]. 系统仿真学报, 2024, 36(3): 770-781.

Qin Baoxin,Zhang Yuxiao,Wu Sirui,et al . Intelligent Optimization of Coal Terminal Unloading Scheduling Based on Improved D3QN Algorithm[J]. Journal of System Simulation, 2024, 36(3): 770-781.

图/表 18

图1

图2

表1

港口卸车流程的状态信息

符号	含义
n	火车总数
F	码头翻车机设备集
D	码头堆料机设备集
k	进港火车的序号
i,j	翻车机与堆料机的序号，i,j =1,2,3,4
$π i j k$	第k辆火车选择的翻车机和堆料机
$A i j k$	$1, 第 k 列火车使用第 i 台翻车机 0, 存在设备占用情况动作不可行$
t_i	翻车机i工作的剩余时间
t_j	堆料机j工作的剩余时间
T^k	第k辆火车的作业耗时
T_all	所有火车的总完成时间, $T a l l = ∑ k = 1 n t k$

表1

表2

港口卸车流程的状态信息

状态空间	数据结构	维数	类型	含义
T_wait	数组	4	Int	火车的等待时间
M_size	数组	42	Int	42个煤堆的存量
M_kind	数组	42	Int	42个煤堆的种类
X_size	数组	4	Int	火车运煤量
X_kind	数组	4	Int	火车运煤种类
Q	数组	4	Bool	传送带占用情况
Z	数组	4	Bool	翻车机占用情况
Q_i	数组	4	Int	4条传送带对应的煤堆
C_i	数组	4	Int	4列火车对应的翻车机
$T o c c u p y i$	数组	4	Int	4条传送带占用时间
Q_z	数组	4	Int	堆料机与翻车机的对应关系

表2

图3

图4

图5

图6

图7

图8

图9

表3

图10

表4

表5

港口历史排产信息

翻车机	车型	煤种	作业实绩
翻车机	车型	煤种	作业时间/h	大票吨
CD2	C80	外购55	111.80	4 320
CD1	C80	神混45	90.38	4 320
CD4	C80	外购55	77.50	4 320
CD3	C64	外石5 000	0	768
$⋮$
CD4	C80	外购45	78.63	4 320
CD3	C64	外石5 000	92.78	4 224
CD2	C80	神混45	82.53	4 320

表5

图11

图12

表6

参考文献 23

1	Ambrosino Daniela, Sciomachen Anna, Tanfani Elena. A Decomposition Heuristics for the Container Ship Stowage Problem[J]. Journal of Heuristics, 2006, 12(3): 211-233.
2	Ambrosino Daniela, Sciomachen Anna, Tanfani Elena. Stowing a Containership: The Master Bay Plan Problem[J]. Transportation Research Part A: Policy and Practice, 2004, 38(2): 81-99.
3	Todd D S, Sen P. A Multiple Criteria Genetic Algorithm for Containership Loading[C]///Proceedings of the 7th International Conference on Genetic Algorithms. [S.l.]: [s.n.], 1997: 674-681.
4	卫家骏. 集装箱船智能配载研究[D]. 大连: 大连海事大学, 2012.
	Wei Jiajun. The Research on Container Ship's Intelligent Stowage[D]. Dalian: Dalian Maritime University, 2012.
5	Briskorn Dirk, Emde Simon, Boysen Nils. Cooperative Twin-crane Scheduling[J]. Discrete Applied Mathematics, 2016, 211: 40-57.
6	魏晨, 胡志华, 高超锋, 等. 自动化集装箱码头堆场内双起重机调度模型与算法[J]. 大连海事大学学报, 2015, 41(4): 75-80, 89.
	Wei Chen, Hu Zhihua, Gao Chaofeng, et al. Scheduling Model and Algorithm of Twin Synchronized Stacking Cranes in Stack Yard of Automated Container Terminal[J]. Journal of Dalian Maritime University, 2015, 41(4): 75-80, 89.
7	黄继伟, 韩晓龙. 基于遗传算法的自动化集装箱码头双轨道吊协同调度优化研究[J]. 计算机应用与软件, 2018, 35(9): 92-98, 143.
	Huang Jiwei, Han Xiaolong. Collaborative Scheduling Optimization of Twin Automated Stacking Cranes in Automatic Container Terminals Based on Genetic Algorithm[J]. Computer Applications and Software, 2018, 35(9): 92-98, 143.
8	Amir Hossein Gharehgozli, Laporte G, Yu Yugang, et al. Scheduling Twin Yard Cranes in a Container Block[J]. Transportation Science, 2015, 49(3): 686-705.
9	魏亚茹, 朱瑾. 自动化码头双场桥调度与集装箱存储选位建模[J]. 计算机应用, 2018, 38(4): 1189-1194, 1206.
	Wei Yaru, Zhu Jin. Modeling of Twin Rail-mounted Gantry Scheduling and Container Slot Selection in Automated Terminal[J]. Journal of Computer Applications, 2018, 38(4): 1189-1194, 1206.
10	初良勇, 李淑娟, 阮志毅. 多箱区多场桥调度优化模型及算法实现[J]. 上海海事大学学报, 2017, 38(1): 37-42.
	Chu Liangyong, Li Shujuan, Ruan Zhiyi. Scheduling Optimization Model and Algorithm Implementation of Multiple Container Blocks with Multiple Yard Cranes[J]. Journal of Shanghai Maritime University, 2017, 38(1): 37-42.
11	蒋静静. 基于深度强化学习的离散型制造企业车间动态调度研究[D]. 西安: 西安理工大学, 2020.
	Jiang Jingjing. Research on Jobshop Dynamic Scheduling of Discrete Manufacturig Enterprises Based on Deep Reinforcement Learning[D]. Xi'an: Xi'an University of Technology, 2020.
12	王凌, 潘子肖. 基于深度强化学习与迭代贪婪的流水车间调度优化[J]. 控制与决策, 2021, 36(11): 2609-2617.
	Wang Ling, Pan Zixiao. Scheduling Optimization for Flow-shop Based on Deep Reinforcement Learning and Iterative Greedy Method[J]. Control and Decision, 2021, 36(11): 2609-2617.
13	Wang Libing, Hu Xin, Wang Yin, et al. Dynamic Job-shop Scheduling in Smart Manufacturing Using Deep Reinforcement Learning[J]. Computer Networks, 2021, 190: 107969.
14	Luo Shu, Zhang Linxuan, Fan Yushun. Dynamic Multi-objective Scheduling for Flexible Job Shop by Deep Reinforcement Learning[J]. Computers and Industrial Engineering, 2021, 159: 107489.
15	Han Baoan, Yang Jianjun. Research on Adaptive Job Shop Scheduling Problems Based on Dueling Double DQN[J]. IEEE Access, 2020, 8: 186474-186495.
16	Hu Liang, Liu Zhenyu, Hu Weifei, et al. Petri-net-based Dynamic Scheduling of Flexible Manufacturing System Via Deep Reinforcement Learning with Graph Convolutional Network[J]. Journal of Manufacturing Systems, 2020, 55: 1-14.
17	Wang Xuelin, Shi Hankun. Research on Intelligent Optimization of Bulk Cargo Terminal Control System[J]. Journal of Physics: Conference Series, 2020, 1601(5): 052044.
18	Alan Dávila de León, Lalla-Ruiz Eduardo, Melián-Batista Belén, et al. A Machine Learning-based System for Berth Scheduling at Bulk Terminals[J]. Expert Systems with Applications, 2017, 87: 170-182.
19	高天佑. 输出型煤炭码头卸车生产调度优化模型和方法研究[D]. 武汉: 武汉理工大学, 2014.
	Gao Tianyou. Optimization Models and Algorithms for Unloading Scheduling of the Export Coal Terminals[D]. Wuhan: Wuhan University of Technology, 2014.
20	Fotuhi F, Huynh N, Vidal J M, et al. Modeling Yard Crane Operators as Reinforcement Learning Agents[J]. Research in Transportation Economics, 2013, 42(1): 3-12.
21	杨奔, 王炜晔, 赵婉婷, 等. 基于DQN的动态深度多分支搜索自动配载算法[J]. 计算机工程, 2020, 46(8): 313-320.
	Yang Ben, Wang Weiye, Zhao Wanting, et al. DQN-based Automatic Stowage Planning Algorithm Using Dynamic Depth Multi-branch Search[J]. Computer Engineering, 2020, 46(8): 313-320.
22	Shen Yifan, Zhao Ning, Xia Mengjue, et al. A Deep Q-learning Network for Ship Stowage Planning Problem[J]. Polish Maritime Research, 2017, 24(S3): 102-109.
23	Li Changan, Wu Sirui, Li Zhan, et al. Intelligent Scheduling Method for Bulk Cargo Terminal Loading Process Based on Deep Reinforcement Learning[J]. Electronics, 2022, 11(9): 1390.

任务	方法决策	非法动作次数	堆料机切换次数	调度总时长/h	优化比例/%
1	随机策略	基准	[20,60,58,19]	321.21	基准
1	训练模型策略	约10	[16,35,31,16]	301.10	6.7
2	随机策略	基准	[33,55,63,25]	332.45	基准
2	训练模型策略	约12	[21,38,30,18]	309.21	6.4

月份	t/h		优化比例/%
月份	人工排产	智能排产	优化比例/%
1	824.03	691.70	16.0
2	721.24	693.79	3.80
3	658.05	620.14	5.80
4	798.50	621.92	22.4
5	710.46	693.23	2.40
6	807.93	632.84	21.7

[1]	赵莹莹, 董普森, 朱天晨, 李凡, 苏运, 邰振赢, 孙庆赟, 凡航. 面向电网拓扑调度仿真的采样效率优化方法研究[J]. 系统仿真学报, 2024, 36(2): 283-295.
[2]	王鑫鹏, 傅汇乔, 邓归洲, 唐开强, 陈春林, 留沧海. 基于DRL和自由步态的六足机器人运动规划研究[J]. 系统仿真学报, 2024, 36(2): 373-384.
[3]	潘海南, 陈柏良, 黄开宏, 任君凯, 程创, 卢惠民, 张辉. 基于深度强化学习的履带机器人摆臂控制方法[J]. 系统仿真学报, 2024, 36(2): 405-414.
[4]	安靖, 司光亚, 张雷. 基于深度强化学习的立体投送策略优化方法研究[J]. 系统仿真学报, 2024, 36(1): 39-49.
[5]	林俊强, 王红军, 邹湘军, 张坡, 李承恩, 周益鹏, 姚书杰. 基于DPPO的移动采摘机器人避障路径规划及仿真[J]. 系统仿真学报, 2023, 35(8): 1692-1704.
[6]	刘家义, 王刚, 付强, 郭相科, 王思远. 基于分配策略优化算法的智能防空任务分配[J]. 系统仿真学报, 2023, 35(8): 1705-1716.
[7]	杨来义, 毕敬, 苑海涛. 基于SAC算法的移动机器人智能路径规划[J]. 系统仿真学报, 2023, 35(8): 1726-1736.
[8]	丁飞, 沙宇晨, 洪莹, 蒯晓, 张登银. 智能网联汽车计算卸载与边缘缓存联合优化策略[J]. 系统仿真学报, 2023, 35(6): 1203-1214.
[9]	戴宇轩, 崔承刚. 基于深度强化学习的Boost变换器控制策略[J]. 系统仿真学报, 2023, 35(5): 1109-1119.
[10]	徐浩添, 秦龙, 曾俊杰, 胡越, 张琪. 基于深度强化学习的对手建模方法研究综述[J]. 系统仿真学报, 2023, 35(4): 671-694.
[11]	史佳洁, 杨鹏, 皮雁南. 基于机器学习的地铁行人流在线优化控制研究[J]. 系统仿真学报, 2023, 35(2): 386-395.
[12]	琚翔, 苏圣超, 徐超杰, 何蓓蓓. 边缘计算下基于深度强化学习的车联网任务调度[J]. 系统仿真学报, 2023, 35(12): 2550-2559.
[13]	王宇琨, 王泽, 董力维, 李妮. 基于分层的智能建模方法的多机空战行为建模[J]. 系统仿真学报, 2023, 35(10): 2249-2261.
[14]	张森, 张孟炎, 邵敬平, 普杰信. 基于随机策略搜索的多机三维路径规划方法[J]. 系统仿真学报, 2022, 34(6): 1286-1295.
[15]	倪凌佳, 黄晓霞, 李红旮, 张子博. 基于协作式深度强化学习的火灾应急疏散仿真研究[J]. 系统仿真学报, 2022, 34(6): 1353-1366.