基于深度强化学习的跨单元动态调度方法

doi:10.16182/j.issn1004731x.joss.22-0666

摘要/Abstract

摘要：

为解决加工任务动态到达的跨单元调度问题，使其能够在智能车间复杂多变的环境中实现自适应调度，提出一种基于深度Q网络的调度方法。构建以单元为节点，工件跨单元加工路径为有向边的复杂网络，引入度值定义了具有跨单元调度特征的状态空间。设计了由工件层、单元层和机器层组成的复合调度规则，分层优化使调度方案更加全局化。针对DDQN(double deep Q networks)在训练后期还会选择次优动作的问题，提出了以指数函数为主体的搜索策略。通过不同规模的仿真实验，验证了所提方法能够应对多变的动态环境，快速生成较优的调度方案。

关键词: 跨单元调度, 动态调度, 强化学习, 度值, 复合规则

Abstract:

In order to solve the intercell scheduling problem of dynamic arrival of machining tasks and realize adaptive scheduling in the complex and changeable environment of the intelligent factory, a scheduling method based on a deep Q network is proposed. A complex network with cells as nodes and workpiece intercell machining path as directed edges is constructed, and the degree value is introduced to define the state space with intercell scheduling characteristics. A compound scheduling rule composed of a workpiece layer, unit layer, and machine layer is designed, and hierarchical optimization makes the scheduling scheme more global. Since double deep Q network (DDQN) still selects sub-optimal actions in the later stage of training, a search strategy based on the exponential function is proposed. Through simulation experiments of different scales, it is verified that the proposed method can deal with the changeable dynamic environment and quickly generate an optimal scheduling scheme.

Key words: intercell scheduling, dynamic scheduling, reinforcement learning, degree value, compound rule

中图分类号:

TP18

倪静,马梦珂 . 基于深度强化学习的跨单元动态调度方法[J]. 系统仿真学报, 2023, 35(11): 2345-2358.

Ni Jing,Ma Mengke . Intercell Dynamic Scheduling Method Based on Deep Reinforcement Learning[J]. Journal of System Simulation, 2023, 35(11): 2345-2358.

图/表 17

表1

符号定义

变量	含义
$i$	工件编号
$n$	工件数量
$k$	机器编号
$m$	机器数量
$j$	工序编号
$n i$	工件 $i$ 的工序数量
$u$	制造单元编号
$h$	制造单元数量
$v$	不同单元间的运输速度

表1

表2

系统变量

变量	含义
$o i j$	工件 $i$ 的第 $j$ 道工序
$T r i$	工件 $i$ 到达调度系统的时间
$T s i$	工件 $i$ 的开始加工时间
$T s k$	机器 $k$ 的开始加工时间
$T s u$	单元 $u$ 最早开始加工时间
$z u$	单元 $u$ 内归置的机器数量
$T p i j k$	工序 $o i j$ 在机器 $k$ 上的加工时间
$T o i j$	工序 $o i j$ 的加工结束时间
$d u u'$	不同单元之间的运输距离
$a i j k$	工序 $o i j$ 可以在机器 $k$ 上加工则为1，否则为0
$b k u$	机器 $k$ 归置于单元 $u$ 则为1，否则为0
$O P i (t)$	$t$ 时刻工件 $i$ 已安排加工的工序数量
$U u (t)$	$t$ 时刻单元 $u$ 的利用率
$U k (t)$	$t$ 时刻机器 $k$ 的利用率
$T a v e (t)$	$t$ 时刻平均工件等待时间
$T c u (t)$	$t$ 时刻单元 $u$ 的最大完工时间
$T c k (t)$	$t$ 时刻机器 $k$ 的最大完工时间

表2

图1

图2

图3

图4

表3

表4

实验参数表

算法	参数	值
神经网络	输入层	9
	隐藏层1	128
	隐藏层2	128
	输出层	18
	优化器	SGD
DDQN	Episode	1000
	学习率 $α$	0.001
	折扣率 $γ$	0.9
	Batch size	32
	Memory size	5000
	Target Q更新频率	200

表4

图5

图6

表5

图7

图8

表6

图9

图10

图11

参考文献 21

1	Tang Jiafu, Zeng Chengkuan, Pan Zhendong. Auction-based Cooperation Mechanism to Parts Scheduling for Flexible Job Shop with Inter-cells[J]. Applied Soft Computing, 2016, 49: 590-602.
2	Li Dongni, Wang Yan, Xiao Guangxue, et al. Dynamic Parts Scheduling in Multiple Job Shop Cells Considering Intercell Moves and Flexible Routes[J]. Computers & Operations Research, 2013, 40(5): 1207-1223.
3	Huang Z, Yang J J. A New Model for Optimization of Cell Scheduling Considering Inter-cell Movement[J]. International Journal of Simulation Modeling, 2022, 21(1): 136-147.
4	Deliktas D, Torkul O, Ustun O. A Flexible Job Shop Cell Scheduling with Sequence-dependent Family Setup Times and Intercellular Transportation Times Using Conic Scalarization Method[J]. International Transactions in Operational Research, 2019, 26(6): 2410-2431.
5	Liu Chunfeng, Wang Jufeng, Leung J Y T, et al. Solving Cell Formation and Task Scheduling in Cellular Manufacturing System by Discrete Bacteria Foraging Algorithm[J]. International Journal of Production Research, 2016, 54(3): 923-944.
6	曾程宽, 刘士新. 求解存在运输空间约束多单元协作调度问题的拍卖算法[J]. 控制与决策, 2019, 34(4): 689-698.
	Zeng Chengkuan, Liu Shixin. Auction-based Cooperation Mechanism for Cell Part Scheduling with Transportation Capacity Constraint[J]. Control and Decision, 2019, 34(4): 689-698.
7	连永伟, 董钊睿, 刘琼. 跨单元调度及其车辆路径集成优化[J]. 中国机械工程, 2022, 33(6): 747-755.
	Lian Yongwei, Dong Zhaorui, Liu Qiong. Integrated Optimization of Intercell Scheduling and Vehicle Routing[J]. China Mechanical Engineering, 2022, 33(6): 747-755.
8	Zhao Meng, Li Xinyu, Gao Liang, et al. An Improved Q-learning Based Rescheduling Method for Flexible Job-shops with Machine Failures[C]//2019 IEEE 15th International Conference on Automation Science and Engineering (CASE). Piscataway, NJ, USA: IEEE, 2019: 331-337.
9	Wang Yufang. Adaptive Job Shop Scheduling Strategy Based on Weighted Q-learning Algorithm[J]. Journal of Intelligent Manufacturing, 2020, 31(2): 417-432.
10	Luo Shu. Dynamic Scheduling for Flexible Job Shop with New Job Insertions by Deep Reinforcement Learning[J]. Applied Soft Computing, 2020, 91: 106208.
11	Zhao Yejian, Wang Yanhong, Tan Yuanyuan, et al. Dynamic Jobshop Scheduling Algorithm Based on Deep Q Network[J]. IEEE Access, 2021, 9: 122995-123011.
12	Lang S, Behrendt F, Lanzerath N, et al. Integration of Deep Reinforcement Learning and Discrete-event Simulation for Real-time Scheduling of a Flexible Job Shop Production[C]//2020 Winter Simulation Conference (WSC). Piscataway, NJ, USA: IEEE, 2020: 3057-3068.
13	Zhou Tong, Tang Dunbing, Zhu Haihua, et al. Reinforcement Learning with Composite Rewards for Production Scheduling in a Smart Factory[J]. IEEE Access, 2021, 9: 752-766.
14	Palombarini J A, Martínez Ernesto C. Closed-loop Rescheduling Using Deep Reinforcement Learning[J]. IFAC-PapersOnLine, 2019, 52(1): 231-236.
15	Kardos C, Laflamme C, Gallina V, et al. Dynamic Scheduling in a Job-shop Production System with Reinforcement Learning[J]. Procedia CIRP, 2021, 97: 104-109.
16	Holthaus O, Rajendran C. Efficient Dispatching Rules for Scheduling in a Job Shop[J]. International Journal of Production Economics, 1997, 48(1): 87-105.
17	Wei Yingzi, Zhao Mingyang. Composite Rules Selection Using Reinforcement Learning for Dynamic Job-shop Scheduling[C]//IEEE Conference on Robotics, Automation and Mechatronics. Piscataway, NJ, USA: IEEE, 2004: 1083-1088.
18	Wang Sen, Zhang Peng, Qin Wei, et al. Composite Dispatching Rule Design for Photolithography Area Scheduling in Wafer Manufacturing System with Multiple Objectives[J]. Applied Mechanics and Materials, 2013, 252: 418-421.
19	Han Baoan, Yang Jianjun. Research on Adaptive Job Shop Scheduling Problems Based on Dueling Double DQN[J]. IEEE Access, 2020, 8: 186474-186495.
20	邹萌邦. 基于复杂网络特征的神经网络调度器求解车间调度问题研究[D]. 武汉: 华中科技大学, 2019.
	Zou Mengbang. Research on Complex Network Features Based Neural Network Scheduler for Job Shop Scheduling Problem[D]. Wuhan: Huazhong University of Science and Technology, 2019.
21	肖鹏飞, 张超勇, 孟磊磊, 等. 基于深度强化学习的非置换流水车间调度问题[J]. 计算机集成制造系统, 2021, 27(1): 192-205.
	Xiao Pengfei, Zhang Chaoyong, Meng Leilei, et al. Non-permutation Flow Shop Scheduling Problem Based on Deep Reinforcement Learning[J]. Computer Integrated Manufacturing Systems, 2021, 27(1): 192-205.

制造单元	坐标		机器
制造单元	X	Y	k₁	k₂	…	k₆
u₁	340	350	1	0	…	1
u₂	200	300	0	1	…	0
u₃	720	200	0	0	…	0

算例	DDQN算法	复合规则最优解	规则编号
MK01_03	67	71	2
MK02_03	91	106	18
MK03_03	342	354	7
MK04_03	115	125	4
MK05_03	230	244	2
MK06_05	217	236	13
MK07_03	290	312	10
MK08_05	579	660	17
MK09_05	521	538	11
MK10_05	417	427	5

算例	DDQN (π^*)	DDQN (ε-greedy)	DQN (π^*)	DQN (ε-greedy)	SA
MK01_03	67	75	79	83	95
MK02_03	91	105	105	125	132
MK03_03	342	350	366	379	367
MK04_03	115	139	153	159	159
MK05_03	230	252	244	257	292
MK06_05	217	247	253	272	267
MK07_03	290	322	329	347	335
MK08_05	579	636	631	655	701
MK09_05	521	530	542	562	553
MK10_05	417	424	429	437	521

[1]	郭润夏, 王一府. 以维修间隔利用率最优为目标的飞机派遣方法[J]. 系统仿真学报, 2023, 35(9): 1985-1999.
[2]	林俊强, 王红军, 邹湘军, 张坡, 李承恩, 周益鹏, 姚书杰. 基于DPPO的移动采摘机器人避障路径规划及仿真[J]. 系统仿真学报, 2023, 35(8): 1692-1704.
[3]	刘家义, 王刚, 付强, 郭相科, 王思远. 基于分配策略优化算法的智能防空任务分配[J]. 系统仿真学报, 2023, 35(8): 1705-1716.
[4]	杨来义, 毕敬, 苑海涛. 基于SAC算法的移动机器人智能路径规划[J]. 系统仿真学报, 2023, 35(8): 1726-1736.
[5]	丁飞, 沙宇晨, 洪莹, 蒯晓, 张登银. 智能网联汽车计算卸载与边缘缓存联合优化策略[J]. 系统仿真学报, 2023, 35(6): 1203-1214.
[6]	戴宇轩, 崔承刚. 基于深度强化学习的Boost变换器控制策略[J]. 系统仿真学报, 2023, 35(5): 1109-1119.
[7]	徐浩添, 秦龙, 曾俊杰, 胡越, 张琪. 基于深度强化学习的对手建模方法研究综述[J]. 系统仿真学报, 2023, 35(4): 671-694.
[8]	石鼎, 燕雪峰, 宫丽娜, 张静宣, 关东海, 魏明强. 强化学习驱动的海战场多智能体协同作战仿真算法[J]. 系统仿真学报, 2023, 35(4): 786-796.
[9]	史佳洁, 杨鹏, 皮雁南. 基于机器学习的地铁行人流在线优化控制研究[J]. 系统仿真学报, 2023, 35(2): 386-395.
[10]	薛乃阳, 丁丹, 贾玉童, 王志强, 刘渊. 基于DQN的异构测控资源联合调度方法[J]. 系统仿真学报, 2023, 35(2): 423-434.
[11]	胡峰, 谷海洋, 林军. 无人机协同车载边缘网络中任务卸载策略[J]. 系统仿真学报, 2023, 35(11): 2373-2384.
[12]	贾政轩, 林廷宇, 肖莹莹, 施国强, 王豪, 曾贲, 欧一鸣, 赵芃芃. 基于强化学习的最优控制指令模仿生成方法[J]. 系统仿真学报, 2023, 35(11): 2410-2418.
[13]	王宇琨, 王泽, 董力维, 李妮. 基于分层的智能建模方法的多机空战行为建模[J]. 系统仿真学报, 2023, 35(10): 2249-2261.
[14]	陈亚绒, 管舒晨, 黄成军, 朱立夏, 周富得. 基于仿真的双目标并行机开放车间自适应动态调度[J]. 系统仿真学报, 2023, 35(1): 69-81.
[15]	赵也践, 王艳红, 张俊, 于洪霞, 田中大. 改进Q学习算法在作业车间调度问题中的应用[J]. 系统仿真学报, 2022, 34(6): 1247-1258.