基于深度强化学习的带容量约束车辆路径问题求解

doi:10.16182/j.issn1004731x.joss.24-0432

摘要/Abstract

摘要：

带容量约束的车辆路径问题(capacitated vehicle routing problem，CVRP)是一种著名的组合优化问题，被称为NP-hard问题，具有高度的复杂性。在现有研究的基础上，提出了一种新颖的基于多指针Transformer端到端深度强化学习方法来解决CVRP。算法模型在编码器中采用了可逆残差网络对输入的特征进行编码，减少了内存资源的消耗，在解码器中采用了多指针网络求出解的概率分布，为了进一步提高CVRP解决方案的性能，利用组合优化问题的对称性，在训练和推理阶段进行多轨迹并行处理，采用了增强的上下文嵌入方法，通过改进的强化学习算法进行训练。实验结果表明：所提算法模型对比当前经典的启发式算法和其他深度学习方法，在较低的内存消耗训练下，求解速度和求解质量之间取得了最好的平衡。

关键词: 深度强化学习, 车辆路径问题, 可逆残差网络, 注意力机制, 改进的REINFORCE算法

Abstract:

The capacitated vehicle routing problem (CVRP) is a well-known combinatorial optimization challenge recognized as NP-hard due to its significant complexity. Building upon existing research, this paper introduces a novel end-to-end deep reinforcement learning approach based on a multi-pointer Transformer to tackle the CVRP. The proposed algorithm employs an invertible residual network in the encoder to encode input features, effectively reducing memory consumption. In the decoder, a multi-pointer network determines the probability distribution of solutions. To further enhance the performance of CVRP solutions, the algorithm leverages the symmetry in combinatorial optimization by implementing multi-trajectory parallel processing during both training and inference phases. Additionally, an enhanced contextual embedding method is utilized, and the model is trained using an improved reinforcement learning algorithm. Experimental results demonstrate that the proposed model strikes the best balance between solving speed and quality with lower memory usage compared to current classic heuristic algorithms and other deep learning approaches.

Key words: deep reinforcement learning, vehicle routing problem, invertible residual networks, attention mechanisms, improved REINFORCE algorithm

中图分类号:

TP18

江明,何韬 . 基于深度强化学习的带容量约束车辆路径问题求解[J]. 系统仿真学报, 2025, 37(9): 2177-2187.

Jiang Ming,He Tao . Solving the Vehicle Routing Problem Based on Deep Reinforcement Learning[J]. Journal of System Simulation, 2025, 37(9): 2177-2187.

图/表 7

图1

图2

图3

图4

图5

表1

图6

参考文献 23

[1]	雷旭, 陈静夷, 陈潇阳. 改进哈里斯鹰算法的仓储机器人路径规划研究[J]. 系统仿真学报, 2024, 36(5): 1081-1092.
	Lei Xu, Chen Jingyi, Chen Xiaoyang. Research on Path Planning of Warehouse Robot with Improved Harris Hawks Algorithm[J]. Journal of System Simulation, 2024, 36(5): 1081-1092.
[2]	Dantzig G B, Ramser J H. The Truck Dispatching Problem[J]. Management Science, 1959, 6(1): 80-91.
[3]	王小康, 冀杰, 刘洋, 等. 基于改进Q学习算法的无人物流配送车路径规划[J]. 系统仿真学报, 2024, 36(5): 1211-1221.
	Wang Xiaokang, Ji Jie, Liu Yang, et al. Path Planning of Unmanned Delivery Vehicle Based on Improved Q-learning Algorithm[J]. Journal of System Simulation, 2024, 36(5): 1211-1221.
[4]	王扬, 陈智斌, 吴兆蕊, 等. 强化学习求解组合最优化问题的研究综述[J]. 计算机科学与探索, 2022, 16(2): 261-279.
	Wang Yang, Chen Zhibin, Wu Zhaorui, et al. Review of Reinforcement Learning for Combinatorial Optimization Problem[J]. Journal of Frontiers of Computer Science & Technology, 2022, 16(2): 261-279.
[5]	Feng Shuo, Sun Haowei, Yan Xintao, et al. Dense Reinforcement Learning for Safety Validation of Autonomous Vehicles[J]. Nature, 2023, 615(7953): 620-627.
[6]	杨来义, 毕敬, 苑海涛. 基于SAC算法的移动机器人智能路径规划[J]. 系统仿真学报, 2023, 35(8): 1726-1736.
	Yang Laiyi, Bi Jing, Yuan Haitao. Intelligent Path Planning for Mobile Robots Based on SAC Algorithm[J]. Journal of System Simulation, 2023, 35(8): 1726-1736.
[7]	杨笑笑, 柯琳, 陈智斌. 深度强化学习求解车辆路径问题的研究综述[J]. 计算机工程与应用, 2023, 59(5): 1-13.
	Yang Xiaoxiao, Ke Lin, Chen Zhibin. Review of Deep Reinforcement Learning Model Research on Vehicle Routing Problems[J]. Computer Engineering and Applications, 2023, 59(5): 1-13.
[8]	Chen Xinyun, Tian Yuandong. Learning to Perform Local Rewriting for Combinatorial Optimization[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2019: 6281-6292.
[9]	黄琰, 张锦. 基于深度强化学习的车辆路径问题求解方法[J]. 交通运输工程与信息学报, 2022, 20(3): 114-127.
	Huang Yan, Zhang Jin. Solving Vehicle Routing Problem Using Deep Reinforcement Learning[J]. Journal of Transportation Engineering and Information, 2022, 20(3): 114-127.
[10]	Duan Lu, Zhan Yang, Hu Haoyuan, et al. Efficiently Solving the Practical Vehicle Routing Problem: A Novel Joint Learning Approach[C]//Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York: ACM, 2020: 3054-3063.
[11]	Wen Haomin, Lin Youfang, Mao Xiaowei, et al. Graph2Route: A Dynamic Spatial-temporal Graph Neural Network for Pick-up and Delivery Route Prediction[C]//Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York: ACM, 2022: 4143-4152.
[12]	Zhang Cong, Wu Fan, Wang He, et al. A Meta-learning Algorithm for Rebalancing the Bike-sharing System in IoT Smart City[J]. IEEE Internet of Things Journal, 2022, 9(21): 21073-21085.
[13]	Qin Zhiwei, Zhu Hongtu, Ye Jieping. Reinforcement Learning for Ridesharing: An Extended Survey[J]. Transportation Research Part C: Emerging Technologies, 2022, 144: 103852.
[14]	Hopfield J J, Tank D W. "Neural" Computation of Decisions in Optimization Problems[J]. Biological Cybernetics, 1985, 52(3): 141-152.
[15]	Vinyals O, Fortunato M, Jaitly N. Pointer Networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2. Cambridge, MA, USA: MIT Press, 2015: 2692-2700.
[16]	Bello I, Pham H, Le Q V, et al. Neural Combinatorial Optimization with Reinforcement Learning[C]//5th International Conference on Learning Representations. [S.l. ]: ICLR, 2017: 1-15.
[17]	Nazari M, Oroojlooy A, Takáč Martin, et al. Reinforcement Learning for Solving the Vehicle Routing Problem[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2018: 9861-9871.
[18]	Kool W, van Hoof H, Welling M. Attention, Learn to Solve Routing Problems![C]//International Conference on Learning Representations, 2018.
[19]	Peng Bo, Wang Jiahai, Zhang Zizhen. A Deep Reinforcement Learning Algorithm Using Dynamic Attention Model for Vehicle Routing Problems[C]//Artificial Intelligence Algorithms and Applications. Singapore: Springer Singapore, 2020: 636-650.
[20]	Kwon Yeong-Dae, Choo Jinho, Kim Byoungjip, et al. Pomo: Policy Optimization with Multiple Optima for Reinforcement Learning[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 21188-21198.
[21]	Xu Yunqiu, Fang Meng, Chen Ling, et al. Reinforcement Learning with Multiple Relational Attention for Solving Vehicle Routing Problems[J]. IEEE Transactions on Cybernetics, 2022, 52(10): 11107-11120.
[22]	Xin Liang, Song Wen, Cao Zhiguang, et al. Multi-decoder Attention Model with Embedding Glimpse for Solving Vehicle Routing Problems[C]//Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence and The Thirty-Third Conference on Innovative Applications of Artificial Intelligence and The Eleventh Symposium on Educational Advances in Artificial Intelligence. Palo Alto: AAAI Press, 2021: 12042-12049.
[23]	Jin Yan, Ding Yuandong, Pan Xuanhao, et al. Pointerformer: Deep Reinforced Multi-pointer Transformer for the Traveling Salesman Problem[C]//Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence. Palo Alto: AAAI Press, 2023: 8132-8140.

方法	CVRP20			CVRP50			CVRP100
方法	路径距离	最优间隙/%	求解时间	路径距离	最优间隙/%	求解时间	路径距离	最优间隙/%	求解时间
LKH3	6.11	0	1 h	10.38	0	5 h	15.68	0	9 h
OR Tools	6.43	5.26	2 min	11.31	8.95	12 min	17.16	9.43	1 h
NeuRewriter	6.16	0.74	18 min	10.51	1.25	22 min	16.10	2.71	1 h
AM (greedy)	6.40	4.75	1 s	10.98	5.78	2 s	16.76	6.89	6 s
POMO	6.14	0.21	5 s	10.42	0.45	26 s	15.73	0.32	2 min
MRAM (greedy)	6.39	4.77	2 s	10.95	5.49	3 s	16.69	6.43	7 s
MDAM (greedy)	6.24	1.79	7 s	10.74	3.47	16 s	16.40	4.86	45 s
Rev-DPN	6.15	0.61	1 s	10.45	0.65	1 s	15.86	1.13	2 s