Journal of System Simulation ›› 2026, Vol. 38 ›› Issue (2): 360-371.doi: 10.16182/j.issn1004731x.joss.25-0595
• Machine Learning Algorithms • Previous Articles
Yang Can, Chen Kai, Zhu Feng
Received:2025-06-24
Revised:2025-09-07
Online:2026-02-18
Published:2026-02-11
Contact:
Zhu Feng
CLC Number:
Yang Can, Chen Kai, Zhu Feng. Reinforcement Learning Based Method for UAV Team Orienteering Optimization under Multi-constraint Condition[J]. Journal of System Simulation, 2026, 38(2): 360-371.
Table 2
Comparison among different methods on multi-scenario UAV-TOP problems
| 场景 | 角度约束 | 威胁区个数 | MIFDAM模型 | AM模型 | OR-tools | PyVRP |
|---|---|---|---|---|---|---|
| 50-3 | 有 | 3 | 15.49 (<1 s) | 14.63 (<1 s) | 13.99 (10 s) | 不适用 |
| 有 | 4 | 14.76 (<1 s) | 13.80 (<1 s) | 13.65 (10 s) | 不适用 | |
| 有 | 5 | 14.17 (<1 s) | 13.33 (<1 s) | 12.21 (10 s) | 不适用 | |
| 无 | 3 | 20.04 (<1 s) | 19.48 (<1 s) | 20.01 (1 s) | 18.89 (1 s) | |
| 无 | 4 | 19.51 (<1 s) | 18.79(<1 s) | 19.92 (1 s) | 18.73 (1 s) | |
| 无 | 5 | 18.96 (<1 s) | 18.03 (<1 s) | 19.86 (1 s) | 18.50 (1 s) | |
| 100-5 | 有 | 3 | 32.37 (<1 s) | 30.17 (<1 s) | 29.13 (200 s) | 不适用 |
| 有 | 4 | 31.20 (<1 s) | 28.90 (<1 s) | 29.08 (200 s) | 不适用 | |
| 有 | 5 | 30.01 (<1 s) | 28.62 (<1 s) | 28.36 (200 s) | 不适用 | |
| 无 | 3 | 39.75 (<1 s) | 38.41 (<1 s) | 38.57 (1 s) | 33.16 (1 s) | |
| 无 | 4 | 38.95 (<1 s) | 37.76 (<1 s) | 38.54 (1 s) | 31.10 (1 s) | |
| 无 | 5 | 38.27 (<1 s) | 37.52 (<1 s) | 38.48 (1 s) | 31.09 (1 s) |
| [1] | Tordesillas J, How J P. PANTHER: Perception-aware Trajectory Planner in Dynamic Environments[J]. IEEE Access, 2022, 10: 22662-22677. |
| [2] | 宁聪, 范菁, 孙书魁. 多无人机协同规划研究综述[J]. 计算机工程与应用, 2025, 61(1): 42-58. |
| Ning Cong, Fan Jing, Sun Shukui. Review of Multi-UAV Collaborative Planning Research[J]. Computer Engineering and Applications, 2025, 61(1): 42-58. | |
| [3] | Reyes-Rubiano Lorena S, Ospina-Trujillo Carlos F, Faulin Javier, et al. The Team Orienteering Problem with Stochastic Service Times and Driving-range Limitations: A Simheuristic Approach[C]//2018 Winter Simulation Conference (WSC). Piscataway: IEEE, 2018: 3025-3035. |
| [4] | Chung K T, Lee C K M, Tsang Y P. Neural Combinatorial Optimization with Reinforcement Learning in Industrial Engineering: A Survey[J]. Artificial Intelligence Review, 2025, 58(5): 130. |
| [5] | Jiang Mingyang, Li Yueyuan, Zhang Songan, et al. HOPE: A Reinforcement Learning-based Hybrid Policy Path Planner for Diverse Parking Scenarios[J]. IEEE Transactions on Intelligent Transportation Systems, 2025, 26(5): 6130-6141. |
| [6] | Kool W, Van Hoof H, Welling M. Attention, Learn to Solve Routing Problems![C]//International Conference on Learning Representations. New Orleans, LA, USA: OpenReview.net, 2019: 1-14. |
| [7] | Braekers Kris, Ramaekers Katrien, Van Nieuwenhuyse Inneke. The Vehicle Routing Problem: State of the Art Classification and Review[J]. Computers & Industrial Engineering, 2016, 99: 300-313. |
| [8] | Cai Junchuang, Zhang Xinzhi, Lin Qiuzhen, et al. Deep Reinforcement Learning for Solving the Vehicle Routing Problem in Practical Logistics[C]//2024 IEEE Congress on Evolutionary Computation (CEC). Piscataway: IEEE, 2024: 1-8. |
| [9] | Rajwar Kanchan, Deep Kusum, Das Swagatam. An Exhaustive Review of the Metaheuristic Algorithms for Search and Optimization: Taxonomy, Applications, and Open Challenges[J]. Artificial Intelligence Review, 2023, 56(11): 13187-13257. |
| [10] | Berto Federico, Hua Chuanbo, Park Junyoung, et al. RL4CO: An Extensive Reinforcement Learning for Combinatorial Optimization Benchmark[EB/OL]. (2023-06-29) [2025-06-01]. . |
| [11] | Mazyavkina Nina, Sviridov Sergey, Ivanov Sergei, et al. Reinforcement Learning for Combinatorial Optimization: A Survey[J]. Computers & Operations Research, 2021, 134: 105400. |
| [12] | AlMahamid Fadi, Grolinger Katarina. Agile DQN: Adaptive Deep Recurrent Attention Reinforcement Learning for Autonomous UAV Obstacle Avoidance[J]. Scientific Reports, 2025, 15(1): 18043. |
| [13] | He Yong, Hou Ticheng, Wang Mingran. A New Method for Unmanned Aerial Vehicle Path Planning in Complex Environments[J]. Scientific Reports, 2024, 14(1): 9257. |
| [14] | Vansteenwegen Pieter, Souffriau Wouter, Van Oudheusden Dirk. The Orienteering Problem: A Survey[J]. European Journal of Operational Research, 2011, 209(1): 1-10. |
| [15] | Drakulic Darko, Michel Sofia, Mai Florian, et al. BQ-NCO: Bisimulation Quotienting for Efficient Neural Combinatorial Optimization[C]//Proceedings of the 37th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2023: 77416-77429. |
| [16] | Berto Federico, Hua Chuanbo, Luttmann Laurin, et al. PARCO: Learning Parallel Autoregressive Policies for Efficient Multi-agent Combinatorial Optimization[EB/OL]. (2024-09-05) [2025-06-01]. . |
| [17] | Vaswani A, Shazeer N, Parmar N, et al. Attention Is All You Need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 6000-6010. |
| [18] | 王扬, 陈智斌, 吴兆蕊, 等. 强化学习求解组合最优化问题的研究综述[J]. 计算机科学与探索, 2022, 16(2): 261-279. |
| Wang Yang, Chen Zhibin, Wu Zhaorui, et al. Review of Reinforcement Learning for Combinatorial Optimization Problem[J]. Journal of Frontiers of Computer Science & Technology, 2022, 16(2): 261-279. | |
| [19] | Bello Irwan, Pham Hieu, V Le Quoc, et al. Neural Combinatorial Optimization with Reinforcement Learning[EB/OL]. (2016-11-29) [2025-06-01]. . |
| [20] | Márcio da Silva Arantes, Jesimar da Silva Arantes, Claudio Fabiano Motta Toledo, et al. A Hybrid Multi-population Genetic Algorithm for UAV Path Planning[C]//Proceedings of the Genetic and Evolutionary Computation Conference 2016. New York: ACM, 2016: 853-860. |
| [21] | Bittner Jiří, Wonka Peter. Visibility in Computer Graphics[J]. Environment and Planning B: Planning and Design, 2003, 30(5): 729-755. |
| [22] | Zhao Jiuxia, Mao Minjia, Zhao Xi, et al. A Hybrid of Deep Reinforcement Learning and Local Search for the Vehicle Routing Problems[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 22(11): 7208-7218. |
| [23] | Vansteenwegen Pieter, Souffriau Wouter, Greet Vanden Berghe, et al. Iterated Local Search for the Team Orienteering Problem with Time Windows[J]. Computers & Operations Research, 2009, 36(12): 3281-3290. |
| [24] | Wouda N A, Lan L, Kool W. PyVRP: : A High-performance VRP Solver Package[J]. Informs Journal on Computing, 2024, 36(4): 943-955. |
| [25] | Nahavandi Saeid, Alizadehsani Roohallah, Nahavandi Darius, et al. A Comprehensive Review on Autonomous Navigation[J]. ACM Computing Surveys, 2025, 57(9): 234. |
| [26] | Li Sirui, Yan Zhongxia, Wu C. Learning to Delegate for Large-scale Vehicle Routing[C]//Proceedings of the 35th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2021: 26198-26211. |
| [1] | Jiang Ming, He Tao. Solving the Vehicle Routing Problem Based on Deep Reinforcement Learning [J]. Journal of System Simulation, 2025, 37(9): 2177-2187. |
| [2] | Jiang Yanji, Zhang Yingyang, Dong Hao, Zhang Xiaoguang, Wang Meihui. Lane Detection in Dark Light Based on Instance Association [J]. Journal of System Simulation, 2025, 37(9): 2188-2199. |
| [3] | Ni Peilong, Mao Pengjun, Wang Ning, Yang Mengjie. Robot Path Planning Based on Improved A-DDQN Algorithm [J]. Journal of System Simulation, 2025, 37(9): 2420-2430. |
| [4] | Lu Bin, Yang Xuan, Yang Zhenyu, Gao Xiaotian. Adaptive Sampling and Ghost Multi-scale Fusion for Lightweight Weld Defect Detection [J]. Journal of System Simulation, 2025, 37(8): 1978-1990. |
| [5] | Liu Zilong, Zhang Lei. Detection of Small Apple Targets Based on Improved YOLOv5 in Natural Environments [J]. Journal of System Simulation, 2025, 37(8): 2124-2138. |
| [6] | Chen Zhen, Wu Zhuoyi, Zhang Lin. Research on Policy Representation in Deep Reinforcement Learning [J]. Journal of System Simulation, 2025, 37(7): 1753-1769. |
| [7] | Wang Ziyi, Zhang Kai, Qian Dianwei, Liu Yuzhen. A DRL⁃based Approach for Distributed Equipment Nodes Selection [J]. Journal of System Simulation, 2025, 37(6): 1565-1573. |
| [8] | Gu Xueqiang, Luo Junren, Zhou Yanzhong, Zhang Wanpeng. Survey on Large Language Agent Technologies for Intelligent Game Theoretic Decision-making [J]. Journal of System Simulation, 2025, 37(5): 1142-1157. |
| [9] | Wu Guohua, Zeng Jiaheng, Wang Dezhi, Zheng Long, Zou Wei. A Quadrotor Trajectory Tracking Control Method Based on Deep Reinforcement Learning [J]. Journal of System Simulation, 2025, 37(5): 1169-1187. |
| [10] | Wang Xiang, Tan Guozhen. Research on Decision-making of Autonomous Driving in Highway Environment Based on Knowledge and Large Language Model [J]. Journal of System Simulation, 2025, 37(5): 1246-1255. |
| [11] | Li Jie, Liu Yang, Li Liang, Su Bengan, Wei Jialong, Zhou Guangda, Shi Yanmin, Zhao Zhen. Remote Sensing Small Object Detection Based on Cross-stage Two-branch Feature Aggregation [J]. Journal of System Simulation, 2025, 37(4): 1025-1040. |
| [12] | Xu Ming, Li Jinye, Zuo Dongyu, Zhang Jing. Signal Timing Optimization via Reinforcement Learning with Traffic Flow Prediction [J]. Journal of System Simulation, 2025, 37(4): 1051-1062. |
| [13] | Wang Xin, Cui Chenggang, Wang Xiangxiang, Zhu Ping. Research on Economic Dispatching Strategy of CHP Units Based on SRL [J]. Journal of System Simulation, 2025, 37(4): 968-981. |
| [14] | Zheng Lanyue, Zhang Yujie. Traffic Signal Detection Based on Improved YOLOv7 [J]. Journal of System Simulation, 2025, 37(4): 993-1007. |
| [15] | Zhang Lei, Zhang Xuechao, Wang Chao, Bo Xianglei. An Intelligent Ambulance Regulation Model Based on Online Reinforcement Learning Algorithm [J]. Journal of System Simulation, 2025, 37(3): 584-594. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||