Journal of System Simulation ›› 2024, Vol. 36 ›› Issue (6): 1452-1467.doi: 10.16182/j.issn1004731x.joss.23-0349

• Papers • Previous Articles     Next Articles

Curriculum Learning-based Simulation of UAV Air Combat Under Sparse Rewards

Zhu Jingyu1(), Zhang Hongli1(), Kuang Minchi2, Shi Heng2, Zhu Jihong2, Qiao zhi2, Zhou Wenqing3   

  1. 1.School of Electrical Engineering, Xinjiang University, Urumqi 830000, China
    2.Department of Precision Instrument, Tsinghua University, Beijing 100084, China
    3.Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
  • Received:2023-03-29 Revised:2023-05-18 Online:2024-06-28 Published:2024-06-19
  • Contact: Zhang Hongli E-mail:zhujingyu@stu.xju.edu.cn;zhl@xju.edu.cn

Abstract:

To address the limited exploration capabilities and sparse rewards of conventional reinforcement learning methods in air combat environment, a curriculum learning distributed proximal policy optimization (CLDPPO) reinforcement learning algorithm is proposed. A reward function informed by professional empirical knowledge is integrated, a discrete action space is developed, and a global observation and local value and decision network featuring separated global and local observations is established. A methodology for unmanned aerial vehicles UAVs is presented to acquire combat expertise through a sequence of fundamental courses that progressively intensify in their offensive, defensive, and comprehensive content. The experimental results show that the methodology surpasses the specialist system and the other mainstream reinforcement learning algorithms, which has the ability of the autonomous acquisition of air warfare tactics and can enhance the sparse rewards.

Key words: UAVs, air combat, sparse reward, curriculum learning, distributed proximal policy optimization (DPPO)

CLC Number: