系统仿真学报 ›› 2024, Vol. 36 ›› Issue (11): 2699-2711.doi: 10.16182/j.issn1004731x.joss.23-0978

• 研究论文 • 上一篇    

协同智能体强化学习算法的柔性作业车间调度方法研究

李健1,2, 李洹坤1, 何鹏博1, 王化北1, 徐莉萍1,2, 何奎1,2   

  1. 1.河南科技大学 机电工程学院,河南 洛阳 471000
    2.机械装备先进制造河南省协同创新中心,河南 洛阳 471000
  • 收稿日期:2023-08-04 修回日期:2023-08-24 出版日期:2024-11-13 发布日期:2024-11-19
  • 第一作者简介:李健(1972-),男,副教授,博士,研究方向为智能工厂。
  • 基金资助:
    国家重点研发计划(2018YFB1701205);河南省科技攻关项目(212102210356)

Flexible Job Shop Scheduling Method Based on Collaborative Agent Reinforcement Learning Algorithm

Li Jian1,2, Li Huankun1, He Pengbo1, Wang Huabei1, Xu Liping1,2, He Kui1,2   

  1. 1.School of Mechatronics Engineering, Henan University of Science and Technology, Luoyang 471000, China
    2.Henan Collaborative Innovation Center for Advanced Manufacturing of Mechanical Equipment, Luoyang 471000, China
  • Received:2023-08-04 Revised:2023-08-24 Online:2024-11-13 Published:2024-11-19

摘要:

为提高柔性作业车间调度效率,构建一种具有柔性作业车间调度问题约束条件的马尔可夫决策过程,针对工件与机器的同时选择问题,提出一种协同智能体强化学习方法进行求解在构建马尔可夫决策过程中,引入析取图表述状态特征,采用两种智能体执行工件与机器的选取,预测不同时刻最小化最大完工时间的差值来映射整个调度过程的奖励参数;求解时,嵌入GIN(graph isomorphic network)图神经网络提取状态,为工件与机器智能体分别设置编码器-解码器构件输出两种动作策略,以PPO(proximal policy optimization)法与D3QN算法训练工件与机器智能体的决策网络参数。通过正交试验法选取算法超参数,以标准实例与其他文献进行对比,实验结果表明,所提方法在求解FJSP方面明显优于其他算法,进一步验证所提方法的可行性与有效性。

关键词: 柔性作业车间调度问题, 图神经网络, 马尔可夫决策过程, 协同智能体强化学习, 正交试验法

Abstract:

To enhance the efficiency of flexible job shop scheduling, this paper develops a Markov decision process with specific constraints tailored to the scheduling problem. A cooperative agent reinforcement learning method is proposed to solve the problem of concurrent selection of workpieces and machines. During the construction of the Markov decision process, a disjunctive graph is introduced to represent the state characteristics. Two agents are introduced to select the workpieces and machines. The reward parameters governing the entire scheduling process are established by predicting variations in the minimum-maximum completion time across different time points. A GIN(graph isomorphic network) graph neural network is embedded in the solving procedure to extract the relevant state information. Encoder and decoder components are respectively set for the workpiece and machine agent to output two action strategies. The PPO(proximal policy optimization) algorithm and D3QN algorithm are used to train the decision network parameters for these agents. Algorithm hyperparameters, determined through the orthogonal experiment method, are compared with standard benchmarks and those in existing literature. The results demonstrate the significant superiority of the proposed method in solving the flexible job shop scheduling problem, further substantiating the feasibility and effectiveness of the method.

Key words: flexible job shop scheduling problem, graph neural network(GNN), Markov decision process, collaborative agent reinforcement learning, orthogonal experiment method

中图分类号: