Solving the Vehicle Routing Problem Based on Deep Reinforcement Learning

doi:10.16182/j.issn1004731x.joss.24-0432

Abstract

Abstract:

The capacitated vehicle routing problem (CVRP) is a well-known combinatorial optimization challenge recognized as NP-hard due to its significant complexity. Building upon existing research, this paper introduces a novel end-to-end deep reinforcement learning approach based on a multi-pointer Transformer to tackle the CVRP. The proposed algorithm employs an invertible residual network in the encoder to encode input features, effectively reducing memory consumption. In the decoder, a multi-pointer network determines the probability distribution of solutions. To further enhance the performance of CVRP solutions, the algorithm leverages the symmetry in combinatorial optimization by implementing multi-trajectory parallel processing during both training and inference phases. Additionally, an enhanced contextual embedding method is utilized, and the model is trained using an improved reinforcement learning algorithm. Experimental results demonstrate that the proposed model strikes the best balance between solving speed and quality with lower memory usage compared to current classic heuristic algorithms and other deep learning approaches.

Key words: deep reinforcement learning, vehicle routing problem, invertible residual networks, attention mechanisms, improved REINFORCE algorithm

CLC Number:

TP18

Jiang Ming, He Tao. Solving the Vehicle Routing Problem Based on Deep Reinforcement Learning[J]. Journal of System Simulation, 2025, 37(9): 2177-2187.

Figures/Tables 7

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Table 1

Fig. 6

References 23

[1]	雷旭, 陈静夷, 陈潇阳. 改进哈里斯鹰算法的仓储机器人路径规划研究[J]. 系统仿真学报, 2024, 36(5): 1081-1092.
	Lei Xu, Chen Jingyi, Chen Xiaoyang. Research on Path Planning of Warehouse Robot with Improved Harris Hawks Algorithm[J]. Journal of System Simulation, 2024, 36(5): 1081-1092.
[2]	Dantzig G B, Ramser J H. The Truck Dispatching Problem[J]. Management Science, 1959, 6(1): 80-91.
[3]	王小康, 冀杰, 刘洋, 等. 基于改进Q学习算法的无人物流配送车路径规划[J]. 系统仿真学报, 2024, 36(5): 1211-1221.
	Wang Xiaokang, Ji Jie, Liu Yang, et al. Path Planning of Unmanned Delivery Vehicle Based on Improved Q-learning Algorithm[J]. Journal of System Simulation, 2024, 36(5): 1211-1221.
[4]	王扬, 陈智斌, 吴兆蕊, 等. 强化学习求解组合最优化问题的研究综述[J]. 计算机科学与探索, 2022, 16(2): 261-279.
	Wang Yang, Chen Zhibin, Wu Zhaorui, et al. Review of Reinforcement Learning for Combinatorial Optimization Problem[J]. Journal of Frontiers of Computer Science & Technology, 2022, 16(2): 261-279.
[5]	Feng Shuo, Sun Haowei, Yan Xintao, et al. Dense Reinforcement Learning for Safety Validation of Autonomous Vehicles[J]. Nature, 2023, 615(7953): 620-627.
[6]	杨来义, 毕敬, 苑海涛. 基于SAC算法的移动机器人智能路径规划[J]. 系统仿真学报, 2023, 35(8): 1726-1736.
	Yang Laiyi, Bi Jing, Yuan Haitao. Intelligent Path Planning for Mobile Robots Based on SAC Algorithm[J]. Journal of System Simulation, 2023, 35(8): 1726-1736.
[7]	杨笑笑, 柯琳, 陈智斌. 深度强化学习求解车辆路径问题的研究综述[J]. 计算机工程与应用, 2023, 59(5): 1-13.
	Yang Xiaoxiao, Ke Lin, Chen Zhibin. Review of Deep Reinforcement Learning Model Research on Vehicle Routing Problems[J]. Computer Engineering and Applications, 2023, 59(5): 1-13.
[8]	Chen Xinyun, Tian Yuandong. Learning to Perform Local Rewriting for Combinatorial Optimization[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2019: 6281-6292.
[9]	黄琰, 张锦. 基于深度强化学习的车辆路径问题求解方法[J]. 交通运输工程与信息学报, 2022, 20(3): 114-127.
	Huang Yan, Zhang Jin. Solving Vehicle Routing Problem Using Deep Reinforcement Learning[J]. Journal of Transportation Engineering and Information, 2022, 20(3): 114-127.
[10]	Duan Lu, Zhan Yang, Hu Haoyuan, et al. Efficiently Solving the Practical Vehicle Routing Problem: A Novel Joint Learning Approach[C]//Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York: ACM, 2020: 3054-3063.
[11]	Wen Haomin, Lin Youfang, Mao Xiaowei, et al. Graph2Route: A Dynamic Spatial-temporal Graph Neural Network for Pick-up and Delivery Route Prediction[C]//Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York: ACM, 2022: 4143-4152.
[12]	Zhang Cong, Wu Fan, Wang He, et al. A Meta-learning Algorithm for Rebalancing the Bike-sharing System in IoT Smart City[J]. IEEE Internet of Things Journal, 2022, 9(21): 21073-21085.
[13]	Qin Zhiwei, Zhu Hongtu, Ye Jieping. Reinforcement Learning for Ridesharing: An Extended Survey[J]. Transportation Research Part C: Emerging Technologies, 2022, 144: 103852.
[14]	Hopfield J J, Tank D W. "Neural" Computation of Decisions in Optimization Problems[J]. Biological Cybernetics, 1985, 52(3): 141-152.
[15]	Vinyals O, Fortunato M, Jaitly N. Pointer Networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2. Cambridge, MA, USA: MIT Press, 2015: 2692-2700.
[16]	Bello I, Pham H, Le Q V, et al. Neural Combinatorial Optimization with Reinforcement Learning[C]//5th International Conference on Learning Representations. [S.l. ]: ICLR, 2017: 1-15.
[17]	Nazari M, Oroojlooy A, Takáč Martin, et al. Reinforcement Learning for Solving the Vehicle Routing Problem[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2018: 9861-9871.
[18]	Kool W, van Hoof H, Welling M. Attention, Learn to Solve Routing Problems![C]//International Conference on Learning Representations, 2018.
[19]	Peng Bo, Wang Jiahai, Zhang Zizhen. A Deep Reinforcement Learning Algorithm Using Dynamic Attention Model for Vehicle Routing Problems[C]//Artificial Intelligence Algorithms and Applications. Singapore: Springer Singapore, 2020: 636-650.
[20]	Kwon Yeong-Dae, Choo Jinho, Kim Byoungjip, et al. Pomo: Policy Optimization with Multiple Optima for Reinforcement Learning[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 21188-21198.
[21]	Xu Yunqiu, Fang Meng, Chen Ling, et al. Reinforcement Learning with Multiple Relational Attention for Solving Vehicle Routing Problems[J]. IEEE Transactions on Cybernetics, 2022, 52(10): 11107-11120.
[22]	Xin Liang, Song Wen, Cao Zhiguang, et al. Multi-decoder Attention Model with Embedding Glimpse for Solving Vehicle Routing Problems[C]//Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence and The Thirty-Third Conference on Innovative Applications of Artificial Intelligence and The Eleventh Symposium on Educational Advances in Artificial Intelligence. Palo Alto: AAAI Press, 2021: 12042-12049.
[23]	Jin Yan, Ding Yuandong, Pan Xuanhao, et al. Pointerformer: Deep Reinforced Multi-pointer Transformer for the Traveling Salesman Problem[C]//Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence. Palo Alto: AAAI Press, 2023: 8132-8140.

方法	CVRP20			CVRP50			CVRP100
方法	路径距离	最优间隙/%	求解时间	路径距离	最优间隙/%	求解时间	路径距离	最优间隙/%	求解时间
LKH3	6.11	0	1 h	10.38	0	5 h	15.68	0	9 h
OR Tools	6.43	5.26	2 min	11.31	8.95	12 min	17.16	9.43	1 h
NeuRewriter	6.16	0.74	18 min	10.51	1.25	22 min	16.10	2.71	1 h
AM (greedy)	6.40	4.75	1 s	10.98	5.78	2 s	16.76	6.89	6 s
POMO	6.14	0.21	5 s	10.42	0.45	26 s	15.73	0.32	2 min
MRAM (greedy)	6.39	4.77	2 s	10.95	5.49	3 s	16.69	6.43	7 s
MDAM (greedy)	6.24	1.79	7 s	10.74	3.47	16 s	16.40	4.86	45 s
Rev-DPN	6.15	0.61	1 s	10.45	0.65	1 s	15.86	1.13	2 s