Journal of System Simulation ›› 2026, Vol. 38 ›› Issue (2): 360-371.doi: 10.16182/j.issn1004731x.joss.25-0595

• Machine Learning Algorithms • Previous Articles    

Reinforcement Learning Based Method for UAV Team Orienteering Optimization under Multi-constraint Condition

Yang Can, Chen Kai, Zhu Feng   

  1. College of Systems Engineering, National University of Defense Technology, Changsha 410073, China
  • Received:2025-06-24 Revised:2025-09-07 Online:2026-02-18 Published:2026-02-11
  • Contact: Zhu Feng

Abstract:

Traditional optimization methods struggle with efficiency, while reinforcement learning approaches often yield low solution quality and high training costs. In response, this paper proposes an attention mechanism-based reinforcement learning method. A dynamic attentionstrategy network with multi-information fusion is designed to improve solution quality. A visibility-graph approach is employed to simplify threat zone constraints and speed up convergence, and a decoding sequence reordering mechanism is introduced for further performance optimization of the solution. The simulation results show that the method generates high-quality solutions within milliseconds, achieving total rewards that approach or even surpass those obtained by traditional solvers such as Ortools and PyVRP within several seconds to hundreds of seconds. The training efficiency is enhanced significantly, with the training time per epoch reducing from several hours to about 30 minutes.

Key words: reinforcement learning, team orienteering problem, multi-UAV systems, attention mechanism

CLC Number: