Research on Autonomous Decision-making in Air-combat Based on Improved Proximal Policy Optimization

doi:10.16182/j.issn1004731x.joss.23-0584

Abstract

Abstract:

To address the problems of high information redundancy and slow convergence speed of traditional reinforcement learning in air-combat autonomous decision-making applications, a proximal policy optimization air-combat autonomous decision-making method, based on dual observation and composite reward is proposed. A dual observation space, which contains interaction information as the main information and individual feature information as a supplement, was designed to reduce the influence of redundant battlefield information on the training efficiency of the decision model. A composite reward function combining result reward and process reward was designed to improve convergence speed. The generalized advantage estimator was applied in the proximal policy optimization strategy algorithm to improve the accuracy of advantage function estimation. Simulation results show that the method decision-making model can make precise autonomous decisions and complete air-combat tasks according to the battlefield situation in two types of experimental scenarios: against fixed-programmed and matrix gaming opponents.

Key words: RL, air-combat autonomous decision-making, dual observation, composite reward, generalized advantage estimator

CLC Number:

TP391.9

Qian Dianwei, Qi Hongmin, Liu Zhen, Zhou Zhiming, Yi Jianqiang. Research on Autonomous Decision-making in Air-combat Based on Improved Proximal Policy Optimization[J]. Journal of System Simulation, 2024, 36(9): 2208-2218.

Figures/Tables 16

Fig. 1

Fig. 2

Table 1

Judgement of conditions for a game

结束条件	对局结果
$Δ d < 3 000 m, α < 30 °, β > 150 °$	红机获胜
$z B > 0$	红机获胜
$Δ d < 3 000 m, α > 150 °, β < 30 °$	蓝机获胜
$z R > 0$	蓝机获胜
$Δ d < 3000 m, α < 30 °, β < 30 °$	双方平局
其他情况	对局继续

Table 1

Fig. 3

Fig. 4

Table 2

Hyperparameter setting of policy network training

参数名	参数值
batch-size	256
折扣因子 $η$	0.99
学习率	0.000 3
GAE- $λ$	0.99
裁剪系数 $ε$	0.2
网络隐藏层的维度	256
不可逃逸区角度 $α s h o t / (°)$	30
最大逃逸角度 $α m a x / (°)$	150
经验缓存区的容量	2 048

Table 2

Fig. 5

Fig. 6

Fig. 7

Fig. 8

Fig. 9

Table 3

Fig. 10

Fig. 11

Fig. 12

Table 4

References 16

1	孙智孝, 杨晟琦, 朴海音, 等. 未来智能空战发展综述[J]. 航空学报, 2021, 42(8): 28-42.
	Sun Zhixiao, Yang Shengqi, Haiyin Piao, et al. A Survey of Air Combat Artificial Intelligence[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42(8): 28-42.
2	Mitchell R R. Embedding a Tactics Expert System into Air Combat Simulation Software[C]//Proceedings of the IEEE National Aerospace and Electronics Conference. Piscataway, NJ, USA: IEEE, 1989: 1027-1033.
3	Li Qiuni, Wang Fawei, Yang Wanping, et al. Air Combat Maneuver Strategy Algorithm Based on Two-layer Game Decision-making and Distributed Double Game Trees MCTS Under Uncertain Information[J]. Electronics, 2022, 11(16): 2608.
4	Ernest N, Carroll D, Schumacher C, et al. Genetic Fuzzy Based Artificial Intelligence for Unmanned Combat Aerial Vehicle Control in Simulated Air Combat Missions[J]. Journal of Defense Management, 2016, 6(1): 1000144.
5	Hu Dongyuan, Yang Rennong, Zuo Jialiang, et al. Application of Deep Reinforcement Learning in Maneuver Planning of Beyond-visual-range Air Combat[J]. IEEE Access, 2021, 9: 32282-32297.
6	Wang Xu, Wang Sen, Liang Xingxing, et al. Deep Reinforcement Learning: A Survey[J/OL]. IEEE Transactions on Neural Networks and Learning Systems. (2022-09-28) [2022-11-18]. .
7	黄晓冬, 苑海涛, 毕敬, 等. 基于DQN的海战场舰船路径规划及仿真[J]. 系统仿真学报, 2021, 33(10): 2440-2448.
	Huang Xiaodong, Yuan Haitao, Bi Jing, et al. DQN-based Path Planning Method and Simulation for Submarine and Warship in Naval Battlefield[J]. Journal of System Simulation, 2021, 33(10): 2440-2448.
8	曾贲, 房霄, 孔德帅, 等. 一种数据驱动的对抗博弈智能体建模方法[J]. 系统仿真学报, 2021, 33(12): 2838-2845
	Zeng Ben, Fang Xiao, Kong Deshuai, et al. A Data-driven Modeling Method for Game Adversity Agent[J]. Journal of System Simulation, 2021, 33(12): 2838-2845.
9	Nam Tran Duc, Quan Tran Hai, Dat Nguyen Van, et al. An Approach for UAV Indoor Obstacle Avoidance Based on AI Technique with Ensemble of ResNet8 and Res-DQN[C]//2019 6th NAFOSTED Conference on Information and Computer Science (NICS). Piscataway, NJ, USA: IEEE, 2019: 330-335.
10	李永丰, 史静平, 章卫国, 等. 深度强化学习的无人作战飞机空战机动决策[J]. 哈尔滨工业大学学报, 2021, 53(12): 33-41.
	Li Yongfeng, Shi Jingping, Zhang Weiguo, et al. Maneuver Decision of UCAV in Air Combat Based on Deep Reinforcement Learning[J]. Journal of Harbin Institute of Technology, 2021, 53(12): 33-41.
11	王昱, 任田君, 范子琳. 基于引导Minimax-DDQN的无人机空战机动决策[J]. 计算机应用, 2023, 43(8): 2636-2643.
	Wang Yu, Ren Tianjun, Fan Zilin. Air Combat Maneuver Decision-making of Unmanned Aerial Vehicle Based on Guided Minimax-DDQN[J]. Journal of Computer Applications, 2023, 43(8): 2636-2643.
12	Hu Dongyuan, Zuo Jialiang, Zhang Wanze, et al. Research on Application of LSTM-QDN in Intelligent Air Combat Simulation[J]. Journal of Physics: Conference Series, 2021, 1746(1): 012028.
13	Jing Xianyong, Hou Manyi, Wu Gaolong, et al. Research on Maneuvering Decision Algorithm Based on Improved Deep Deterministic Policy Gradient[J]. IEEE Access, 2022, 10: 92426-92445.
14	Pope A P, Ide J S, Mićović Daria, et al. Hierarchical Reinforcement Learning for Air Combat at DARPA's AlphaDogfight Trials[J]. IEEE Transactions on Artificial Intelligence, 2023, 4(6): 1371-1385.
15	Schulman J, Wolski F, Dhariwal P, et al. Proximal Policy Optimization Algorithms[EB/OL]. (2017-08-28) [2022-11-27]. .
16	康扬名. 多无人机协同对抗系统智能决策与控制研究[D]. 北京: 中国科学院大学, 2020.
	Kang Yangming. Research on Intelligent Decision and Control of Multi-UAV Cooperative Countermeasure System[D]. Beijing: University of Chinese Academy of Sciences, 2020.

空战对手模型	红机胜率	蓝机胜率	平局概率
平飞运动	89.9	3.0	7.1
水平盘旋运动	88.5	3.2	8.3

[1]	Li Chao, Li Jiabao, Ding Caichang, Ye Zhiwei, Zuo Fangwei. Edge Surveillance Task Offloading and Resource Allocation Algorithm Based on DRL [J]. Journal of System Simulation, 2024, 36(9): 2113-2126.
[2]	Zhu Zilu, Liu Yongkui, Zhang Lin, Wang Lihui, Lin Tingyu. Simulation of Robotic Peg-in-hole Assembly Strategy Based on DRL [J]. Journal of System Simulation, 2024, 36(6): 1414-1424.
[3]	Zhou Zhiyong, Mo Fei, Zhao Kai, Hao Yunbo, Qian Yufeng. Adaptive PID Control Algorithm Based on PPO [J]. Journal of System Simulation, 2024, 36(6): 1425-1432.
[4]	Wang Hongjun, Lin Junqiang, Zou Xiangjun, Zhang Po, Zhou Mingxuan, Zou Weirui, Tang Yunchao, Luo Lufeng. Construction of a Virtual Interactive System for Orchards Based on Digital Twin [J]. Journal of System Simulation, 2024, 36(6): 1493-1508.
[5]	Wang Yuan, Xu Lin, Gong Xiaoze, Zhang Yongliang, Wang Yongli. Gradient-based Deep Reinforcement Learning Interpretation Methods [J]. Journal of System Simulation, 2024, 36(5): 1130-1140.
[6]	Zhao Yingying, Dong Pusen, Zhu Tianchen, Li Fan, Su Yun, Tai Zhenying, Sun Qingyun, Fan Hang. Efficiency Optimization Method for Data Sampling in Power Grid Topology Scheduling Simulation [J]. Journal of System Simulation, 2024, 36(2): 283-295.
[7]	Wang Xinpeng, Fu Huiqiao, Deng Guizhou, Tang Kaiqiang, Chen Chunlin, Liu Canghai. Research on Motion Planning of Hexapod Robot Based on DRL and Free Gait [J]. Journal of System Simulation, 2024, 36(2): 373-384.
[8]	Pan Hainan, Chen Bailiang, Huang Kaihong, Ren Junkai, Cheng Chuang, Lu Huimin, Zhang Hui. Flipper Control Method for Tracked Robot Based on Deep Reinforcement Learning [J]. Journal of System Simulation, 2024, 36(2): 405-414.
[9]	An Jing, Si Guangya, Zhang Lei. Strategy Optimization Method of Multi-dimension Projection Based on Deep Reinforcement Learning [J]. Journal of System Simulation, 2024, 36(1): 39-49.
[10]	Xiaofeng Wang, Taiqian Shen, Yuan Liu. Research on Dynamic Simulation Technology for Satellite Internet [J]. Journal of System Simulation, 2023, 35(7): 1472-1486.
[11]	Fei Ding, Meinan Zhang, Hengheng Zhuang, Hairong Ma, Dengyin Zhang. Target Search Planning and Algorithm for Monitoring of Polar Disaster Areas [J]. Journal of System Simulation, 2023, 35(7): 1526-1538.
[12]	Hongliang Zhang, Jingru Xu, Bo Tan, Gongjie Xu. Dual Resource Constrained Flexible Job Shop Energy-saving Scheduling Considering Delivery Time [J]. Journal of System Simulation, 2023, 35(4): 734-746.
[13]	Wang Yukun, Wang Ze, Dong Liwei, Li Ni. Research on Multi-aircraft Air Combat Behavior Modeling Based on Hierarchical Intelligent Modeling Methods [J]. Journal of System Simulation, 2023, 35(10): 2249-2261.
[14]	Huilin Zhang, Yujie Jin, Haima Yang. Sensorless Control of PMSM Based on an ANFIS Optimized Flux Sliding Mode Observer [J]. Journal of System Simulation, 2022, 34(8): 1682-1690.
[15]	Xinyu Dou, Xiaohui Chen, Dequn Liang, Bin Lin. A High Spectral-efficiency Maritime Very-High-Frequency Communication Technology and Simulation [J]. Journal of System Simulation, 2022, 34(6): 1208-1218.