协同智能体强化学习算法的柔性作业车间调度方法研究

doi:10.16182/j.issn1004731x.joss.23-0978

摘要/Abstract

摘要：

为提高柔性作业车间调度效率，构建一种具有柔性作业车间调度问题约束条件的马尔可夫决策过程，针对工件与机器的同时选择问题，提出一种协同智能体强化学习方法进行求解。在构建马尔可夫决策过程中，引入析取图表述状态特征，采用两种智能体执行工件与机器的选取，预测不同时刻最小化最大完工时间的差值来映射整个调度过程的奖励参数；求解时，嵌入GIN（graph isomorphic network）图神经网络提取状态，为工件与机器智能体分别设置编码器-解码器构件输出两种动作策略，以PPO（proximal policy optimization）算法与D3QN算法训练工件与机器智能体的决策网络参数。通过正交试验法选取算法超参数，以标准实例与其他文献进行对比，实验结果表明，所提方法在求解FJSP方面明显优于其他算法，进一步验证所提方法的可行性与有效性。

关键词: 柔性作业车间调度问题, 图神经网络, 马尔可夫决策过程, 协同智能体强化学习, 正交试验法

Abstract:

To enhance the efficiency of flexible job shop scheduling, this paper develops a Markov decision process with specific constraints tailored to the scheduling problem. A cooperative agent reinforcement learning method is proposed to solve the problem of concurrent selection of workpieces and machines. During the construction of the Markov decision process, a disjunctive graph is introduced to represent the state characteristics. Two agents are introduced to select the workpieces and machines. The reward parameters governing the entire scheduling process are established by predicting variations in the minimum-maximum completion time across different time points. A GIN(graph isomorphic network) graph neural network is embedded in the solving procedure to extract the relevant state information. Encoder and decoder components are respectively set for the workpiece and machine agent to output two action strategies. The PPO(proximal policy optimization) algorithm and D3QN algorithm are used to train the decision network parameters for these agents. Algorithm hyperparameters, determined through the orthogonal experiment method, are compared with standard benchmarks and those in existing literature. The results demonstrate the significant superiority of the proposed method in solving the flexible job shop scheduling problem, further substantiating the feasibility and effectiveness of the method.

Key words: flexible job shop scheduling problem, graph neural network(GNN), Markov decision process, collaborative agent reinforcement learning, orthogonal experiment method

中图分类号:

TP278

李健,李洹坤,何鹏博等 . 协同智能体强化学习算法的柔性作业车间调度方法研究[J]. 系统仿真学报, 2024, 36(11): 2699-2711.

Li Jian,Li Huankun,He Pengbo,et al . Flexible Job Shop Scheduling Method Based on Collaborative Agent Reinforcement Learning Algorithm[J]. Journal of System Simulation, 2024, 36(11): 2699-2711.

图/表 11

表1

符号说明

符号	说明
n	工件数
p	每道工件的工序数
$m$	机器数
O_total	工序总数
i	工件序号
k	机器序号
j	工序序号
O_ij	工件i的第j道工序
C_max	最大完工时间
T_ijk	工件i的第j道工序在机器k的加工时间
m_ij	工件i的第j道工序可选机器数
Ω_ij	工件i的第j道工序可选机器集
S_ij	工件i的第j道工序加工开始时间
S_ijk	工件i的第j道工序在机器k的加工开始时间
F_ij	工件i的第j道工序加工结束时间
F_ijk	工件i的第j道工序在机器k的加工结束时间
D_ijk	整数变量，工件i的第j道工序在机器k上加工取1，否则取0

表1

图1

图2

图3

图4

图5

图6

图7

图8

表2

图9

参考文献 21

1	Brucker P, Schlie R. Job-shop Scheduling with Multi-purpose Machines[J]. Computing, 1990, 45(4): 369-375.
2	Li Junqing, Pan Quanke, Liang Y C. An Effective Hybrid Tabu Search Algorithm for Multi-objective Flexible Job-shop Scheduling Problems[J]. Computers & Industrial Engineering, 2010, 59(4): 647-662.
3	蔡敏, 王艳, 纪志成. 基于多策略融合量子粒子群算法的MOFFJSP研究[J]. 系统仿真学报, 2021, 33(11): 2615-2626.
	Cai Min, Wang Yan, Ji Zhicheng. Research on MOFFJSP Based on Multi-strategy Fusion Quantum Particle Swarm Optimization[J]. Journal of System Simulation, 2021, 33(11): 2615-2626.
4	张朝阳, 徐莉萍, 李健, 等. 基于改进狼群算法的柔性作业车间调度研究[J]. 系统仿真学报, 2023, 35(3): 534-543.
	Zhang Chaoyang, Xu Liping, Li Jian, et al. Flexible Job-shop Scheduling Problem Based on Improved Wolf Pack Algorithm[J]. Journal of System Simulation, 2023, 35(3): 534-543.
5	Riedmiller Simone, Riedmiller Martin. A Neural Reinforcement Learning Approach to Learn Local Dispatching Policies in Production Scheduling[C]//Proceedings of the 16th International Joint Conference on Artificial Intelligence. San Francisco: Morgan Kaufmann Publishers Inc., 1999: 764-769.
6	Gui Yong, Tang Dunbing, Zhu Haihua, et al. Dynamic Scheduling for Flexible Job Shop Using a Deep Reinforcement Learning Approach[J]. Computers & Industrial Engineering, 2023, 180: 109255.
7	Luo Shu. Dynamic Scheduling for Flexible Job Shop with New Job Insertions by Deep Reinforcement Learning[J]. Applied Soft Computing, 2020, 91: 106208.
8	Lei Kun, Guo Peng, Zhao Wenchao, et al. A Multi-action Deep Reinforcement Learning Framework for Flexible Job-shop Scheduling Problem[J]. Expert Systems with Applications, 2022, 205: 117796.
9	Liu Renke, Piplani Rajesh, Toro Carlos. Deep Reinforcement Learning for Dynamic Scheduling of a Flexible Job Shop[J]. International Journal of Production Research, 2022, 60(13): 4049-4069.
10	Jing Xuan, Yao Xifan, Liu Min, et al. Multi-agent Reinforcement Learning Based on Graph Convolutional Network for Flexible Job Shop Scheduling[J]. Journal of Intelligent Manufacturing, 2024, 35(1): 75-93.
11	Du Yu, Li Junqing, Li Chengdong, et al. A Reinforcement Learning Approach for Flexible Job Shop Scheduling Problem with Crane Transportation and Setup Times[J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(4): 5695-5709.
12	Song Wen, Chen Xinyang, Li Qiangng, et al. Flexible Job-shop Scheduling via Graph Neural Network and Deep Reinforcement Learning[J]. IEEE Transactions on Industrial Informatics, 2023, 19(2): 1600-1610.
13	Nazari M, Oroojlooy A, Takáč Martin, et al. Reinforcement Learning for Solving the Vehicle Routing Problem[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2018: 9861-9871.
14	Brandimarte Paolo. Routing and Scheduling in a Flexible Job Shop by tabu Search[J]. Annals of Operations Research, 1993, 41(3): 157-183.
15	Jiang Tianhua, Zhang Chao. Application of Grey Wolf Optimization for Solving Combinatorial Problems: Job Shop and Flexible Job Shop Scheduling Cases[J]. IEEE Access, 2018, 6: 26231-26240.
16	Wang Yuhui, He Hao, Tan Xiaoyang. Truly Proximal Policy Optimization[C]//Truly Proximal Policy Optimization. Chia Laguna Resort: PMLR, 2020: 113-122.
17	Schulman J, Moritz P, Levine S, et al. High-dimensional Continuous Control Using Generalized Advantage Estimation[EB/OL]. (2010-10-20) [2023-05-16]. .
18	Xu Keyulu, Hu Weihua, Leskovec J, et al. How Powerful are Graph Neural Networks?[EB/OL]. (2019-02-22) [2023-05-31]. .
19	Han B A, Yang J J. A Deep Reinforcement Learning Based Solution for Flexible Job Shop Scheduling Problem[J]. International Journal of Simulation Modelling, 2021, 20(2): 375-386.
20	张凯, 毕利, 焦小刚. 集成强化学习算法的柔性作业车间调度问题研究[J]. 中国机械工程, 2023, 34(2): 201-207.
	Zhang Kai, Bi Li, Jiao Xiaogang. Research on Flexible Job-shop Scheduling Problems with Integrated Reinforcement Learning Algorithm[J]. China Mechanical Engineering, 2023, 34(2): 201-207.
21	吴昊泽, 李艳武, 谢辉. 改进PPO算法求解柔性作业车间调度问题[J/OL]. 计算机集成制造系统. (2023-06-02) [2023-06-28]. .
	Wu Haoze, Li Yanwu, Xie Hui. Improved Proximal Policy Optimization Algorithm for Solving Flexible Job Shop Scheduling Problem[J/OL]. Computer Integrated Manufacturing Systems. (2023-06-02) [2023-06-28]. .

算例	最优解	GWO		AC-SD		D5QN		MPPO		本文算法(CARL)
算例	最优解	C_max	RPD	C_max	RPD	C_max	RPD	C_max	RPD	C_max	RPD
平均值	163.3	182.2	24.76	216.7	59.65	181.5	26.26	180.3	22.81	179.8	22.79
MK01	36	40	11.11	44	22.22	42	16.66	42	16.66	41	13.88
MK02	24	29	20.83	28	16.66	31	29.16	28	16.66	28	16.66
MK03	204	204	0	245	20.10	204	0	204	0	204	0
MK04	48	64	33.33	74	54.16	69	43.75	67	39.58	66	37.50
MK05	168	175	4.16	193	14.88	180	7.14	173	2.98	175	4.16
MK06	33	69	109.10	123	272.73	68	106.06	64	93.94	66	100
MK07	133	147	10.52	216	62.4	153	15.04	145	9.02	145	9.02
MK08	523	523	0	523	0	523	0	523	0	523	0
MK09	299	322	7.69	386	29.10	315	5.35	325	8.70	319	6.68
MK10	165	249	50.91	337	104.24	230	39.40	232	40.61	230	39.40

[1]	刘金辉, 陈孟元, 韩朋朋, 陈何宝, 张玉坤. 面向移动机器人大视角运动的图神经网络视觉SLAM算法[J]. 系统仿真学报, 2024, 36(5): 1043-1060.
[2]	刘家义, 王刚, 付强, 郭相科, 王思远. 基于分配策略优化算法的智能防空任务分配[J]. 系统仿真学报, 2023, 35(8): 1705-1716.
[3]	周娴玮, 龚启旭, 余松森. 基于Stackelberg博弈与深度强化学习的计算卸载策略[J]. 系统仿真学报, 2023, 35(2): 372-385.
[4]	王霄汉, 张霖, 任磊, 谢堃钰, 王昆玉, 叶飞, 陈真. 基于强化学习的车间调度问题研究简述[J]. 系统仿真学报, 2021, 33(12): 2782-2791.
[5]	高雪莹, 唐昊, 苗刚中, 平兆武. 储能系统能量调度与需求响应联合优化控制[J]. 系统仿真学报, 2016, 28(5): 1165-1172.
[6]	李江波, 王波, 高岩, 张惠珍. 马尔可夫决策过程下的智能电网实时电价模型[J]. 系统仿真学报, 2016, 28(11): 2756-2763.