系统仿真学报 ›› 2025, Vol. 37 ›› Issue (9): 2177-2187.doi: 10.16182/j.issn1004731x.joss.24-0432

• 论文 •    

基于深度强化学习的带容量约束车辆路径问题求解

江明, 何韬   

  1. 福建理工大学 互联网经贸学院,福建 福州 350118
  • 收稿日期:2024-04-23 修回日期:2024-05-31 出版日期:2025-09-18 发布日期:2025-09-22
  • 通讯作者: 何韬
  • 第一作者简介:江明(1984-),男,副教授,博士,研究方向为数字孪生、IT治理研究。
  • 基金资助:
    福建省省科技厅-省创新战略研究项目(2023R0060)

Solving the Vehicle Routing Problem Based on Deep Reinforcement Learning

Jiang Ming, He Tao   

  1. School of Internet Economics and Business, Fujian University of Technology, Fuzhou 350014, China
  • Received:2024-04-23 Revised:2024-05-31 Online:2025-09-18 Published:2025-09-22
  • Contact: He Tao

摘要:

带容量约束的车辆路径问题(capacitated vehicle routing problem,CVRP)是一种著名的组合优化问题,被称为NP-hard问题,具有高度的复杂性。在现有研究的基础上,提出了一种新颖的基于多指针Transformer端到端深度强化学习方法来解决CVRP。算法模型在编码器中采用了可逆残差网络对输入的特征进行编码,减少了内存资源的消耗,在解码器中采用了多指针网络求出解的概率分布,为了进一步提高CVRP解决方案的性能,利用组合优化问题的对称性,在训练和推理阶段进行多轨迹并行处理,采用了增强的上下文嵌入方法,通过改进的强化学习算法进行训练。实验结果表明:所提算法模型对比当前经典的启发式算法和其他深度学习方法,在较低的内存消耗训练下,求解速度和求解质量之间取得了最好的平衡。

关键词: 深度强化学习, 车辆路径问题, 可逆残差网络, 注意力机制, 改进的REINFORCE算法

Abstract:

The capacitated vehicle routing problem (CVRP) is a well-known combinatorial optimization challenge recognized as NP-hard due to its significant complexity. Building upon existing research, this paper introduces a novel end-to-end deep reinforcement learning approach based on a multi-pointer Transformer to tackle the CVRP. The proposed algorithm employs an invertible residual network in the encoder to encode input features, effectively reducing memory consumption. In the decoder, a multi-pointer network determines the probability distribution of solutions. To further enhance the performance of CVRP solutions, the algorithm leverages the symmetry in combinatorial optimization by implementing multi-trajectory parallel processing during both training and inference phases. Additionally, an enhanced contextual embedding method is utilized, and the model is trained using an improved reinforcement learning algorithm. Experimental results demonstrate that the proposed model strikes the best balance between solving speed and quality with lower memory usage compared to current classic heuristic algorithms and other deep learning approaches.

Key words: deep reinforcement learning, vehicle routing problem, invertible residual networks, attention mechanisms, improved REINFORCE algorithm

中图分类号: