Application of Improved Q Learning Algorithm in Job Shop Scheduling Problem

doi:10.16182/j.issn1004731x.joss.21-0099

Abstract

Abstract:

Aiming at the job shop scheduling in a dynamic environment, a dynamic scheduling algorithm based on an improved Q learning algorithm and dispatching rules is proposed. The state space of the dynamic scheduling algorithm is described with the concept of "the urgency of remaining tasks" and a reward function with the purpose of "the higher the slack, the higher the penalty" is disigned. In view of the problem that the greedy strategy will select the sub-optimal actions in the later stage of learning, the traditional Q learning algorithm is improved by introducing an action selection strategy based on the "softmax" function, which makes the improved Q learning algorithm more equal in the probability of selecting different actions in the early stage. The simulation results obtained from 6 different test instances show that the performance indicator of the scheduling algorithm is improved by an average of about 6.5% compared to the before and by about 38.3% and 38.9% respectively compared with the IPSO algorithm and PSO algorithm. The indicator is significantly better than conventional methods such as using a single dispatching rule and traditional optimization algorithms.

Key words: reinforcement learning, Q learning, dispatching rules, dynamic scheduling, job shop scheduling

CLC Number:

TB497

Yejian Zhao, Yanhong Wang, Jun Zhang, Hongxia Yu, Zhongda Tian. Application of Improved Q Learning Algorithm in Job Shop Scheduling Problem[J]. Journal of System Simulation, 2022, 34(6): 1247-1258.

Figures/Tables 17

Table 1

Problem description parameter set

符号	符号描述
$n$	工件总数
$i, l$	工件索引， $i, ? l = [1, ? 2, ? …, ? n]$
$m$	机器总数
$M g$	第g台机器
$g, ? h$	机器索引， $? g, ? h = [1, ? 2, ? …, ? m]$
$J i$	第i个工件
$j$	工序索引
$O i j$	$J i$ 的第j道工序
$n i$	$J i$ 的工序数量
$t$	当前调度时刻(每当一道工序加工完成，或有新工件到达，都视为调度时刻)
$P T i g$	在t时刻， $J i$ 在 $M g$ 上的加工时间
$N P M i (t)$	在t时刻， $J i$ 流经的机器数

Table 1

Table 2

Mathematical description parameter set

符号	描述
$t i M g$	$J i$ 在 $M g$ 上的加工时间
$c i M g$	$J i$ 在 $M g$ 上的加工完成时间
$a i M h M g$	布尔系数，当 $M h$ 先于 $M g$ 加工 $J i$ 时， $a i M h M g = 1$ ，否则 $a i M h M g = 0$
$b i l M g$	布尔系数，当 $J i$ 先于 $J l$ 在机器上加工时， $b i l M g$ =1，否则 $b i l M g = 0$
$A i$	$J i$ 的释放时间
$D i$	$J i$ 的预计加工完成时间
$C i$	$J i$ 的实际加工完成时间
$β$	调节因子，一个很小的正值常数
$k$	延迟系数，若为负值，表示在加工过程中某些工件已经出现加工滞后现象

Table 2

Table 3

State space parameters and variable sets

参数与变量	描述(所有工件)
$N P M i (t)$	在t时刻， $J i$ 流经的机器数
$P T i g$	在t时刻， $J i$ 在 $M g$ 上的加工时间
$E A S T (t)$	在t时刻，剩余工序预计平均松弛时间
$E A R T (t)$	在t时刻，预计平均剩余加工时间

Table 3

Table 4

State space set

状态序列	状态区间	动作1	动作2	动作3
0	$E A S T (t) ≤ 0$	$Q (0,0)$	$Q (0,1)$	$Q (0,2)$
1	$0 ≤ E A S T (t) < E A R T (t) ? h$	$Q (1,0)$	$Q (1,1)$	$Q (1,2)$
2	$h ? E A R T (t) ≤ E A S T (t) < E A R T (t) ? 2 h$	$Q (2,0)$	$Q (2,1)$	$Q (2,2)$
3	$2 h ? E A R T (t) ≤ E A S T (t) < E A R T (t) ? 3 h$	$Q (3,0)$	$Q (3,1)$	$Q (3,2)$
4	$3 h ? E A R T (t) ≤ E A S T (t) < E A R T (t) ? 4 h$	$Q (4,0)$	$Q (4,1)$	$Q (4,2)$
5	$4 h ? E A R T (t) ≤ E A S T (t)$	$Q (5,0)$	$Q (5,1)$	$Q (5,2)$

Table 4

Table 5

Values of simulation parameters

参数	取值
学习率 $α$	0.01
折扣率 $λ$	0.9
回报函数参数c	1
状态空间参数h	1
k取值范围	{0.2, 0.4, 0.6, 0.8, 1.0}
Episode	1 000
Softmax策略u	1
均匀分布	[1,10]

Table 5

Table 6

Table 7

Table 8

Table 9

Table 10

Table 11

Table 12

Fig. 1

Fig. 2

Table 13

Fig. 3

Table 14

References 22

1	Applegate D, Cook W. A Computational Study of the Job-Shop Scheduling Problem[J]. Orsa Journal on Computing (S0899-1499), 1991, 3(2): 149-156.
2	郑先鹏, 王雷. 面向作业车间调度问题的遗传算法改进[J]. 河北科技大学学报, 2019, 40(6): 496-502.
	Zheng Xianpeng, Wang Lei. Improved Genetic Algorithm for Job Shop Scheduling[J]. Journal of Hebei University of Science and Technology, 2019, 40(6): 496-502.
3	Ge H W, Sun L, Liang Y C, et al. An Effective PSO and AIS-Based Hybrid Intelligent Algorithm for Job-Shop Scheduling[J]. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans(S1083-4427), 2008, 38(2): 358-368.
4	Iskander P W. A Survey of Scheduling Rules[J]. Operations Research(S0030-364X), 1977, 25(1): 45-61.
5	Jones A, Rabelo L C, Sharawi A T. Survey of Job Shop Scheduling Techniques[M]// Wiley Encyclopedia of Electrical and Electronics Engineering. John Wiley & Sons, Inc, 1999.
6	Durasevic M, Jakobovic D. A Survey of Dispatching Rules for the Dynamic Unrelated Machines Environment[J]. Expert Systems with Applications(S0957-4174), 2018, 113: 555-569.
7	王东军, 刘翱, 刘克, 等. 基于优先规则的复杂并行机调度问题研究[J]. 系统工程理论与实践, 2016, 36(3):779-786.
	Wang Dongjun, Liu Ao, Liu Ke, et al. Priority Rule-Based Complex Indentical Parallel Machines Scheduling[J]. System Engineering Theory and Practice, 2016, 36(3): 779-786.
8	Holthaus O, Rajendran C. Efficient Dispatching Rules for Scheduling in a Job Shop[J]. International Journal of Production Economics(S0925-5273), 1997, 48(1): 87-105.
9	Sutton R S, Barto A G. Reinforcement Learning: An Introduction[J]. IEEE Transactions on Neural Networks(S1045-9227), 1998, 9(5): 1054.
10	Jahangirian M, Eldabi T, Naseer A, et al. Simulation in Manufacturing and Business: A Review[J]. European Journal of Operational Research(S0377-2217), 2010, 203(1): 1-13.
11	Dorigo M, Colombetti M. Robot Shaping: Developing Autonomous Agents Through Learning[J]. Artificial Intelligence(S0004-3702), 1994, 71(2): 321-370.
12	Bouazza W, Sallez Y, Beldjilali B. A Distributed Approach Solving Partially Flexible Job-Shop Scheduling Problem with a Q-learning Effect[J]. Ifac Papersonline(S2405-8963), 2017, 50(1): 15890-15895.
13	Shiue Y R, Lee K C, Su C T. Real-Time Scheduling for a Smart Factory Using a Reinforcement Learning Approach[J]. Computers & Industrial Engineering(S0360-8352), 2018, 125: 604-614.
14	Wang Yufang. Adaptive Job Shop Scheduling Strategy Based on Weighted Q-Learning Algorithm[J]. Journal of Intelligent Manufacturing (S0956-5515), 2020, 31(2): 417-432.
15	Shahrabi J, Adibi M A, Mahootchi M. A Reinforcement Learning Approach to Parameter Estimation in Dynamic Job Shop Scheduling[J]. Computers & Industrial Engineering(S0360-8352), 2017, 110: 75-82.
16	Aydin M E, Ztemel E. Dynamic Job-Shop Scheduling Using Reinforcement Learning Agents[J]. Robotics & Autonomous Systems(S0921-8890), 2000, 33(2/3): 169-178.
17	Zhao M, Li X, Gao L, et al. An Improved Q-Learning Based Rescheduling Method for Flexible Job-Shops with Machine Failures[C]//2019 IEEE 15th International Conference on Automation Science and Engineering(CASE). Vancouver, Canada: IEEE, 2019: 331-337.
18	张智聪, 郑力. 基于增强学习的制造系统调度[M]. 北京: 科学出版社, 2016.
	Zhang Zhicong, Zheng Li. Manufacturing System Scheduling Based on Reinforcement Learning[M]. Beijing: Science Press, 2016.
19	肖鹏飞, 张超勇, 孟磊磊, 等. 基于深度强化学习的非置换流水车间调度问题[J]. 计算机集成制造系统, 2021, 27(1): 192-205.
	Xiao Pengfei, Zhang Chaoyong, Meng Leilei, et al. Non-Permutation Flow Shop Scheduling Problem Based on Deep Reinforcement Learning[J]. Computer Intergarted Manufacturing Systems, 2021, 27(1): 192-205.
20	Wei Yingzi, Zhao Mingyang. A Reinforcement Learning-Based Approach to Dynamic Job-Shop Scheduling[J]. Acta Automatica Sinica(S0254-4156), 2005, 31(5): 765-771.
21	刘洪铭, 曾鸿雁, 周伟,等. 基于改进粒子群算法作业车间调度问题的优化[J]. 山东大学学报(工学版), 2019, 49(1): 57-62.
	Liu Hongming, Zeng Hongyan, Zhou Wei, et al. Optimization of Job Shop Scheduling Problem Based on Improved Particle Swarm Algorithm[J]. Journal of Shandong University (Engineering Edition), 2019, 49(1):57-62.
22	杨恒. 基于改进粒子群算法的作业车间调度优化[J]. 机械设计与制造工程, 2019, 48(2): 77-80.
	Yang Heng. Optimization of Job Shop Scheduling Based on Improved Particle Swarm Algorithm[J]. Mechanical Design and Manufacturing Engineering, 2019, 48(2):77-80.

规则名称	规则描述
FCFP	优先选择最先到达机器的工件
SPT	优先选择工序加工时间最短的工件
SL	优先选择调度松弛量最小的工件
LOPNR	优先选择剩余工序数量最少的工件
MWKR	优先选择剩余加工时间最长的工件
Random	随机选择一个工件

规则	k=0.2	k=0.4	k=0.6	k=0.8	k=1.0
FCFP	547.42	523.17	446.46	389.08	312.00
SPT	527.12	510.67	459.66	334.18	303.10
SL	538.02	549.82	470.76	418.88	323.60
LOPNR	563.72	605.42	447.86	405.78	299.30
MWKR	542.62	531.17	475.76	381.48	301.20
Random	533.12	546.12	486.96	379.58	327.00
QL	524.02	507.62	442.26	312.78	292.30

规则	k=0.2	k=0.4	k=0.6	k=0.8	k=1.0
FCFP	447.54	423.75	337.52	289.06	248.00
SPT	468.04	456.05	321.52	277.36	230.70
SL	463.74	487.63	394.72	325.66	276.50
LOPNR	457.64	437.63	356.62	292.46	214.30
MWKR	447.54	463.65	344.12	313.66	264.30
Random	483.84	445.13	359.22	293.36	274.70
QL	445.84	440.85	293.42	277.26	212.30

规则	k=0.2	k=0.4	k=0.6	k=0.8	k=1.0
FCFP	702.24	720.68	522.39	426.07	485.93
SPT	693.04	679.18	565.25	488.83	494.06
SL	759.51	745.68	623.65	572.83	529.67
LOPNR	733.84	730.18	605.79	583.73	513.67
MWKR	687.44	660.88	547.65	482.63	492.06
Random	713.64	692.68	577.99	506.63	480.93
QL	684.44	653.68	521.52	423.36	474.93

规则	k=0.2	k=0.4	k=0.6	k=0.8	k=1.0
FCFP	763.13	733.23	574.2	507.93	435.00
SPT	765.87	726.63	566.2	512.13	463.93
SL	789.2	745.63	622.47	601.4	526.00
LOPNR	736.67	711.53	589.37	538.74	528.16
MWKR	721.17	704.13	603.2	510.93	487.63
Random	750.47	730.43	615.07	522.53	532.87
QL	704.73	692.83	515.33	494.8	426.73