系统仿真学报 ›› 2024, Vol. 36 ›› Issue (3): 770-781.doi: 10.16182/j.issn1004731x.joss.22-1320

• 论文 • 上一篇    下一篇

基于改进D3QN的煤炭码头卸车排产智能优化方法

秦保新1(), 张羽霄2, 吴思锐2, 曹卫冲1, 李湛2,3()   

  1. 1.国能(天津)港务有限责任公司, 天津 300450
    2.哈尔滨工业大学 航天学院智能控制与系统研究所, 黑龙江 哈尔滨 150001
    3.鹏城实验室 数学与理论部, 广东 深圳 518055
  • 收稿日期:2022-11-05 修回日期:2023-04-24 出版日期:2024-03-15 发布日期:2024-03-14
  • 通讯作者: 李湛 E-mail:11620065@chnenergy.com.cn;zhanli@hit.edu.cn
  • 第一作者简介:秦保新(1976-),男,高工,硕士,研究方向为大型机械智能化,码头设备与生产管理。E-mail:11620065@chnenergy.com.cn
  • 基金资助:
    国家自然科学基金(62273122)

Intelligent Optimization of Coal Terminal Unloading Scheduling Based on Improved D3QN Algorithm

Qin Baoxin1(), Zhang Yuxiao2, Wu Sirui2, Cao Weichong1, Li Zhan2,3()   

  1. 1.Guoneng (Tianjin) Port Co. , Ltd, Tianjin 300450, China
    2.Research Institute of Intelligent Control and Systems, Harbin Institute of Technology, Harbin 150001, China
    3.Department of Mathematics and Theory, Peng Cheng Laboratory, Shenzhen 518055, China
  • Received:2022-11-05 Revised:2023-04-24 Online:2024-03-15 Published:2024-03-14
  • Contact: Li Zhan E-mail:11620065@chnenergy.com.cn;zhanli@hit.edu.cn

摘要:

采用智能化决策排产能够提高大型港口的运营效率,是人工智能技术在智慧港口场景落地的重要研究方向之一。针对煤炭码头卸车智能排产任务,将其抽象为马尔可夫序列决策问题。建立了该问题的深度强化学习模型,并针对该模型中动作空间维度高且可行动作稀疏的特点,提出一种改进的D3QN算法,实现了卸车排产调度决策的智能优化。仿真结果表明,对于同一组随机任务序列,优化后的排产策略相比随机策略实现了明显的效率提升。同时,将训练好的排产策略应用于随机生成的新任务序列,可实现5%~7%的排产效率提升,表明该优化方法具有较好的泛化能力。此外,随着决策模型复杂度的提升,传统启发式优化算法面临建模困难、求解效率低等突出问题。所提算法为该类问题的研究提供了一种新思路,有望实现深度强化学习智能决策在港口排产任务中的更广泛应用。

关键词: 码头卸车排产, 调度策略优化, 智能决策, 深度强化学习, Dueling Double DQN算法

Abstract:

Intelligent decision scheduling can improve the operation efficiency of large ports, which is one of the important research directions for the implementation of artificial intelligence technology in the smart port scenario. This article studies the intelligent unloading scheduling tasks of coal terminals and abstracts them as a Markov sequence decision problem. A deep reinforcement learning model for this problem is established, and an improved D3QN algorithm is proposed to realize intelligent optimization of unloading scheduling decisions by considering the characteristics of high action space dimension and sparse feasible action in the model. The simulation results show that for the same set of random task sequences, the optimized scheduling strategy obviously improves the efficiency compared with the random strategy. At the same time, the trained scheduling strategy is directly applied to the randomly generated new task sequence, and the scheduling efficiency is improved by 5%~7%, which indicates that the optimization method has good generalization ability. In addition, with the increasing complexity of decision models, traditional heuristic optimization algorithms are faced with prominent problems such as difficult modeling and low solving efficiency. This article provides a new idea for studying this kind of problem, which is expected to realize the wider application of deep reinforcement learning-based intelligent decision-making in port scheduling tasks.

Key words: terminal unloading scheduling, scheduling strategy optimization, intelligent decision-making, deep reinforcement learning, Dueling Double DQN algorithm

中图分类号: