基于强化学习的车间调度问题研究简述

doi:10.16182/j.issn1004731x.joss.21-FZ0774

摘要/Abstract

摘要： 强化学习在车间调度上获得了较低的时间响应和较优的模型泛化性。为阐述基于强化学习的车间调度问题整体研究现状,总结当前基于强化学习的调度框架,同时为后续相关研究奠定基础,介绍了车间调度与强化学习的背景,分析了车间调度问题中常用的2种仿真技术,给出了强化学习解决车间调度问题的2种常用架构。此外,针对强化学习在车间调度问题上的应用,指出了现存的一些挑战,并对相关研究进展从直接调度、基于特征表示的调度、以及基于参数搜索的调度3个方面进行了介绍。

关键词: 强化学习应用, 车间调度, 图神经网络, 组合优化, 深度学习, 特征表示

Abstract: Reinforcement Learning (RL) achieves lower time response and better model generalization in Job Shop Scheduling Problem (JSSP). To explain the current overall research status of JSSP based on RL, summarize the current scheduling framework based on RL, and lay the foundation for follow-up research, the backgrounds of JSSP and RL are introduced. Two simulation techniques commonly used in JSSP are analyzed and two commonly used frameworks for RL to solve JSSP are given. In addition, some existing challenges are pointed out, and related research progress is introduced from three aspects: direct scheduling, feature representation-based scheduling, and parameter search-based scheduling.

Key words: reinforcement learning application, job shop scheduling problem, graph neural network, combinatorial optimization, deep learning, feature representation

中图分类号:

TP391.9

王霄汉, 张霖, 任磊, 谢堃钰, 王昆玉, 叶飞, 陈真. 基于强化学习的车间调度问题研究简述[J]. 系统仿真学报, 2021, 33(12): 2782-2791.

Wang Xiaohan, Zhang Lin, Ren Lei, Xie Kunyu, Wang Kunyu, Ye Fei, Chen Zhen. Brief Review on Applying Reinforcement Learning to Job Shop Scheduling Problems[J]. Journal of System Simulation, 2021, 33(12): 2782-2791.

参考文献

[1] 潘如媛. 深度强化学习求解作业调度问题方法研究[D]. 北京: 北京交通大学, 2020.
Pan Ruyuan.Research on Deep Reinforcement Learning Methods for Solving Flowshop Scheduling Problem[D]. Beijing: Beijing Jiaotong University, 2020.
[2] Zhang C, Song W, Cao Z, et al.Learning to Dispatch for Job Shop Scheduling via Deep Reinforcement Learning[C]// Advances in Neural Information Processing Systems (NeurIPS). 2020. arXiv preprint arXiv: 2010.12367, 2020.
[3] Gianfrancesco M A, Tamang S, Yazdany J, et al.Potential Biases in Machine Learning Algorithms Using Electronic Health Record Data[J]. JAMA Internal Medicine (S2168-6106), 2018, 178(11): 1544-1547.
[4] Mnih V, Kavukcuoglu K, Silver D, et al. Playing Atari with Deep Reinforcement Learning[J]. Computer Science.2013. arXiv preprint arXiv: 1312.5602, 2013.
[5] 李凯文, 张涛, 王锐, 等. 基于深度强化学习的组合优化研究进展[J/OL].自动化学报. (2020-12-09) [2021-05-31]. https://doi.org/10.16383/j.aas.c200551.
Li Kaiwen, Zhang Tao, Wang Rui, et al. Research Reviews of Combinatorial Optimization Methods Based on Deep Reinforcement Learning[J/OL]. Acta Automatica Sinica. (2020-12-09) [2021-05-31]. https://doi.org/10.16383/j.aas.c200551.
[6] Zhang T, Xie S, Rose O.Real-time Job Shop Scheduling Based on Simulation and Markov Decision Processes[C]// 2017 Winter Simulation Conference (WSC). Las Vegas: IEEE, 2017: 3899-3907.
[7] 陈勇, 阮幸聪, 鲁建厦. 基于元胞自动机的大型零件生产车间动态柔性调度仿真建模[J]. 中国机械工程, 2010, 21(21): 2603-2609.
Chen Yong, Ruan Xingcong, Lu Jianxia.Simulation Modeling of Dynamic & Flexible Scheduling about Large-sized Component Production Workshop Based on Cellular Automata[J]. China Mechanical Engineering, 2010, 21(21): 2603-2609.
[8] 郑忠, 徐乐, 高小强. 基于元胞自动机的车间天车调度仿真模型[J]. 系统工程理论与实践, 2008(2): 137-142.
Zheng Zhong, Xu Le, Gao Xiaoqiang.Simulation Model of Crane Scheduling in Workshop Based on Cellular Automata[J]. System Engineering Theory and Practice, 2008(2): 137-142.
[9] 孟寅茂. 基于元胞机的造船企业分段车间空间调度建模与仿真[D]. 杭州: 浙江工业大学, 2012.
Meng Yinmao.Modeling and Simulation of Block Workshop Spatial Scheduling Based on Cellular Automata in Shipbuilding Enterprise[D]. Hangzhou: Zhejiang University of Technology, 2012.
[10] 张晴, 饶运清. 车间动态调度方法研究[J]. 机械制造, 2003, 41(1): 39-41.
Zhang Qing, Rao Yunqing.Research on Dynamic Workshop Scheduling Method[J]. Machinery, 2003, 41(1): 39-41.
[11] 曲丹. 基于多Agent的车间调度仿真系统研究[D]. 成都: 西华大学, 2009.
Qu Dan.Research on Simulation System of Job-Shop Scheduling Based on Multi-Agent[D]. Chengdu: Xihua University, 2009.
[12] 徐修文, 邱顺流, 宋豫川, 等. 离散制造车间动态事件影响评估方法[J]. 重庆大学学报(自然科学版), 2012, 35(增1): 1-5.
Xu Xiuwen, Qiu Shunliu, Song Yuchuan, et al.Impact Assessment Method of Dynamic Events in Discrete Production Workshop[J]. Journal of Chongqing University(Natural Science Edition), 2012, 35(S1): 1-5.
[13] Park J, Chun J, Kim S H, et al.Learning to Schedule Job-shop Problems: Representation and Policy Learning Using Graph Neural Network and Reinforcement Learning[J]. International Journal of Production Research (S0020-7543), 2021, 59(11): 1-18.
[14] 张超勇. 基于自然启发式算法的作业车间调度问题理论与应用研究[D]. 武汉: 华中科技大学, 2007.
Zhang Chaoyong.Research on the Shop Scheduling Problem with Naturally-Inspired Heuristic Algorithms[D]. Wuhan: Huazhong University of Science and Technology, 2007.
[15] Sampson J R.Adaptation in Natural and Artificial Systems (John H. Holland)[J]. Society for Industrial and Applied Mathematics (S0036-1445), 1976, 18(3): 529-530.
[16] Sivaram M, Batri K, Amin Salih M, et al.Exploiting the Local Optima in Genetic Algorithm using Tabu Search[J]. Indian Journal of Science and Technology (S0974-6846), 2019, 12(1): 1-13.
[17] Kirkpatrick S, Gelatt C D, Vecchi M P.Optimization by Simulated Annealing[J]. Science (S0036-8075), 1983, 220(4598): 671-680.
[18] Manosij G, Ritam G, Sarkar R, et al.A Wrapper-filter Feature Selection Technique Based on Ant Colony Optimization[J]. Neural Computing & Applications (S0941-0643), 2020, 32(12): 7839-7857.
[19] Venter G, Jaroslaw S S.Particle Swarm Optimization[J]. AIAA Journal (S0001-1452), 2003, 41(8): 129-132.
[20] Farmer J D, Packard N H, Perelson A S. The Immune System, Adaptation, and Machine Learning[J]. Physica D: Nonlinear Phenomena (S0167-2789), 1986, 22(1/3): 187-204.
[21] Arulkumaran K, Deisenroth M P, Brundage M, et al.Deep Reinforcement Learning: A Brief Survey[J]. IEEE Signal Processing Magazine (S1053-5888), 2017, 34(6): 26-38.
[22] Sutton R S, Barto A G.Reinforcement Learning: An Introduction[M]. Cambridge: MIT Press, 2018.
[23] Watkins C J C H, Dayan P. Q-learning[J]. Machine Learning (S0885-6125), 1992, 8(3/4): 279-292.
[24] Xue T, Zeng P, Yu H.A Reinforcement Learning Method for Multi-AGV Scheduling in Manufacturing[C]// 2018 IEEE International Conference on Industrial Technology (ICIT). Lyon: IEEE, 2018: 1557-1561.
[25] Luo S.Dynamic Scheduling for Flexible Job Shop with New Job Insertions by Deep Reinforcement Learning[J]. Applied Soft Computing (S1568-4946), 2020, 91: 106208.
[26] Samsonov V, Kemmerling M, Paegert M, et al.Manufacturing Control in Job Shop Environments with Reinforcement Learning[C]// 13th International Conference on Agents and Artificial Intelligence. 2021.
[27] 李亚飞, 吴庆顺, 徐明亮, 等. 基于强化学习的舰载机保障作业实时调度方法[J]. 中国科学: 信息科学, 2021, 51(2): 247-262.
Li Yafei, Wu Qingshun, Xu Mingliang, et al.Real-time Scheduling for Carrier-borne Aircraft Support Operations:a Reinforcement Learning Approach[J]. Science China Information Sciences, 2021, 51(2): 247-262.
[28] Zhang C, Song W, Cao Z, et al.Learning to Dispatch for Job Shop Scheduling via Deep Reinforcement Learning[C]// Neural Information Processing Systems (NeurIPS). Vancouver: MIT Press, 2020.
[29] Hameed M, Schwung A.Reinforcement Learning on Job Shop Scheduling Problems Using Graph Networks[J]. arXiv preprint arXiv:2009.03836, 2020.
[30] 熊波. 基于异构图神经网络的多智能体资源调度模型[D]. 北京: 北京交通大学, 2020.
Xiong Bo.Multi-Agent Resource Balancing with Heterogeneous Graph Neural Networks[D]. Beijing: Beijing Jiaotong University, 2020.
[31] Shahrabi J, Adibi M A, Mahootchi M.A Reinforcement Learning Approach to Parameter Estimation in Dynamic Job Shop Scheduling[J]. Computers & Industrial Engineering (S0360-8352), 2017, 110(8): 75-82.
[32] Chen R, Yang B, Li S, et al.A Self-learning Genetic Algorithm Based on Reinforcement Learning for Flexible Job-shop Scheduling Problem[J]. Computers & Industrial Engineering (S0360-8352), 2020, 149(7): 106778.
[33] Chen X, Tian Y. Learning to Perform Local Rewriting for Combinatorial Optimization[J]. arXiv preprint arXiv:1810.00337, 2018.