系统仿真学报 ›› 2022, Vol. 34 ›› Issue (6): 1247-1258.doi: 10.16182/j.issn1004731x.joss.21-0099

• 仿真建模理论与方法 • 上一篇    下一篇

改进Q学习算法在作业车间调度问题中的应用

赵也践(), 王艳红(), 张俊, 于洪霞, 田中大   

  1. 沈阳工业大学 人工智能学院,辽宁 沈阳 110027
  • 收稿日期:2021-02-02 修回日期:2021-03-14 出版日期:2022-06-30 发布日期:2022-06-16
  • 通讯作者: 王艳红 E-mail:zhao_yejian@163.com;wangyh_sut@163.com
  • 作者简介:赵也践(1995-),男,硕士,研究方向为车间调度与机器学习。E-mail:zhao_yejian@163.com
  • 基金资助:
    国家自然科学基金(61803273);辽宁省重点研发计划(2020JH2/10100041)

Application of Improved Q Learning Algorithm in Job Shop Scheduling Problem

Yejian Zhao(), Yanhong Wang(), Jun Zhang, Hongxia Yu, Zhongda Tian   

  1. School of Artificial Intelligence, Shenyang University of Technology, Shenyang 110027, China
  • Received:2021-02-02 Revised:2021-03-14 Online:2022-06-30 Published:2022-06-16
  • Contact: Yanhong Wang E-mail:zhao_yejian@163.com;wangyh_sut@163.com

摘要:

为解决动态环境下作业车间调度问题,提出了一种基于改进Q学习算法和调度规则的动态调度算法。以“剩余任务紧迫程度”的概念来描述动态调度算法的状态空间;设计了以“松弛越高,惩罚越高”为宗旨的回报函数;通过引入以Softmax函数为主体的动作选择策略来改进传统的Q学习算法,使改进后的Q学习算法在前期选择不同动作的概率更加平等,同时改善了贪婪策略在学习后期还会选择次优动作的现象。仿真结果表明:该调度算法相较于改进前,性能指标平均提升约6.5%;相较于IPSO算法和PSO算法,性能指标平均提升分别约为38.3%和38.9%,调度结果明显优于使用单一调度规则以及传统优化算法等常规方法。

关键词: 强化学习, Q学习, 调度规则, 动态调度, 作业车间调度

Abstract:

Aiming at the job shop scheduling in a dynamic environment, a dynamic scheduling algorithm based on an improved Q learning algorithm and dispatching rules is proposed. The state space of the dynamic scheduling algorithm is described with the concept of "the urgency of remaining tasks" and a reward function with the purpose of "the higher the slack, the higher the penalty" is disigned. In view of the problem that the greedy strategy will select the sub-optimal actions in the later stage of learning, the traditional Q learning algorithm is improved by introducing an action selection strategy based on the "softmax" function, which makes the improved Q learning algorithm more equal in the probability of selecting different actions in the early stage. The simulation results obtained from 6 different test instances show that the performance indicator of the scheduling algorithm is improved by an average of about 6.5% compared to the before and by about 38.3% and 38.9% respectively compared with the IPSO algorithm and PSO algorithm. The indicator is significantly better than conventional methods such as using a single dispatching rule and traditional optimization algorithms.

Key words: reinforcement learning, Q learning, dispatching rules, dynamic scheduling, job shop scheduling

中图分类号: