系统仿真学报 ›› 2023, Vol. 35 ›› Issue (11): 2345-2358.doi: 10.16182/j.issn1004731x.joss.22-0666

• 论文 • 上一篇    下一篇

基于深度强化学习的跨单元动态调度方法

倪静(), 马梦珂   

  1. 上海理工大学,上海 200093
  • 收稿日期:2022-06-20 修回日期:2022-09-04 出版日期:2023-11-25 发布日期:2023-11-24
  • 第一作者简介:倪静(1972-),女,副教授,博士,研究方向为在线社会网络,智能优化算法。E-mail:nijing501@126.com
  • 基金资助:
    教育部人文社会科学基金(19YJAZH064)

Intercell Dynamic Scheduling Method Based on Deep Reinforcement Learning

Ni Jing(), Ma Mengke   

  1. University of Shanghai for Science and Technology, Shanghai 200093, China
  • Received:2022-06-20 Revised:2022-09-04 Online:2023-11-25 Published:2023-11-24

摘要:

为解决加工任务动态到达的跨单元调度问题,使其能够在智能车间复杂多变的环境中实现自适应调度,提出一种基于深度Q网络的调度方法。构建以单元为节点,工件跨单元加工路径为有向边的复杂网络,引入度值定义了具有跨单元调度特征的状态空间。设计了由工件层、单元层和机器层组成的复合调度规则,分层优化使调度方案更加全局化。针对DDQN(double deep Q networks)在训练后期还会选择次优动作的问题,提出了以指数函数为主体的搜索策略。通过不同规模的仿真实验,验证了所提方法能够应对多变的动态环境,快速生成较优的调度方案。

关键词: 跨单元调度, 动态调度, 强化学习, 度值, 复合规则

Abstract:

In order to solve the intercell scheduling problem of dynamic arrival of machining tasks and realize adaptive scheduling in the complex and changeable environment of the intelligent factory, a scheduling method based on a deep Q network is proposed. A complex network with cells as nodes and workpiece intercell machining path as directed edges is constructed, and the degree value is introduced to define the state space with intercell scheduling characteristics. A compound scheduling rule composed of a workpiece layer, unit layer, and machine layer is designed, and hierarchical optimization makes the scheduling scheme more global. Since double deep Q network (DDQN) still selects sub-optimal actions in the later stage of training, a search strategy based on the exponential function is proposed. Through simulation experiments of different scales, it is verified that the proposed method can deal with the changeable dynamic environment and quickly generate an optimal scheduling scheme.

Key words: intercell scheduling, dynamic scheduling, reinforcement learning, degree value, compound rule

中图分类号: