Strategy Optimization Method of Multi-dimension Projection Based on Deep Reinforcement Learning

doi:10.16182/j.issn1004731x.joss.22-0886

Abstract

Abstract:

Based on the perfect performance of deep reinforcement learning (DRL) in strategy optimization, this paper proposes a strategy optimization method of action taking the multi-dimension projection action as the main research object. The method combines the simulation experiment method with the DRL method. After analyzing the current situation of strategy optimization research, the deep learning framework is selected according to the research problems, and a DRL multi-dimension projection strategy model based on the asynchronous advantage actor-critic (A3C) algorithm is constructed. Through simulation experiments, the interactive learning between the DRL model and the simulation of "out of the loop" is realized, and the optimized multi-dimension projection strategy is obtained. Finally, the effectiveness of the cooperative optimization strategy between the DRL framework and the simulation experiment is verified.

Key words: deep reinforcement learning (DRL), simulation, strategy optimization, multi-dimension projection, asynchronous advantage actor-critic (A3C) algorithm

CLC Number:

TP391.9

An Jing, Si Guangya, Zhang Lei. Strategy Optimization Method of Multi-dimension Projection Based on Deep Reinforcement Learning[J]. Journal of System Simulation, 2024, 36(1): 39-49.

Figures/Tables 10

Fig. 1

Fig. 2

Table 1

Table 2

Fig. 3

Fig. 4

Fig. 5

Fig. 6

Table 3

Fig. 7

References 15

1	杨峰, 李群, 王维平, 等. 基于仿真的探索性评估方法论[J]. 系统仿真学报, 2003(11): 1561-1564.
	Yang Feng, Li Qun, Wang Weiping, et al. Simulation Based Exploratory Evaluation Methodology[J]. Journal of System Simulation, 2003(11): 1561-1564.
2	杨镜宇, 司光亚, 胡晓峰. 信息化战争体系对抗探索性仿真分析方法研究[J]. 系统仿真学报, 2005(6): 1469-1472, 1496.
	Yang Jingyu, Si Guangya, Hu Xiaofeng. Study on Simulation-based Exploratory Analysis Method of Information Warfare System of System (SoS) Encounter[J]. Journal of System Simulation, 2005(6): 1469-1472, 1496.
3	Payne D. Commond Decision Model Technologey Assess-ment: ADAY34926, 19-5[R]. [S.l.]: [s.n.]: 16-27.
4	李斌, 刘苏洋, 李春洪, 等. 探索性仿真实验仿真想定空间筛选[J]. 火力与指挥控制, 2013, 38(5): 152-156.
	Li Bin, Liu Suyang, Li Chunhong, et al. Screening the Simulation Scenario Space in Exploratory Simulation Experiment [J]. Fire Control & Command Control, 2013, 38(5): 152-156.
5	喻飞飞, 赵志敏, 包俊. 探索性仿真分析框架下的实验点设计方法[J]. 指挥控制与仿真, 2014, 36(2): 80-84.
	Yu Feifei, Zhao Zhimin, Bao Jun. Experiment Points Design Methods Under the Framework of Exploratory Simulation Analysis[J]. Command Control & Simulation, 2014, 36(2): 80-84.
6	姚桐, 王越, 董岩, 等. 深度强化学习在作战任务规划中的应用[J]. 飞航导弹, 2020(4): 16-21.
7	吴昭欣, 李辉, 王壮, 等. 基于深度强化学习的智能仿真平台设计[J]. 战术导弹技术, 2020(4): 193-200.
	Wu Zhaoxin, Li Hui, Wang Zhuang, et al. The Design of Intelligence Simulation Platform Based on DRL[J]. Tactical Missile Technology, 2020(4): 193-200.
8	于博文, 吕明, 张捷. 基于分层强化学习的联合作战仿真作战决策算法[J]. 火力与指挥控制, 2021, 46(10): 140-146.
	Yu Bowen, Ming Lü, Zhang Jie. Joint Operation Simulation Decision-making Algorithm Based on Hierarchical Reinforcement Learning[J]. Fire Control & Command Control, 2021, 46(10): 140-146.
9	石鼎, 燕雪峰, 宫丽娜, 等. 强化学习驱动的海战场多智能体协同作战仿真算法[J]. 系统仿真学报, 2023, 35(4): 786-796.
	Shi Ding, Yan Xuefeng, Gong Lina, et al. Multi-agent Cooperative Combat Simulation in Naval Battlefield with Reinforcement Learning[J]. Journal of System Simulation, 2023, 35(4): 786-796.
10	Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous Control with Deep Reinforcement Learning[J]. (2019-07-05) [2022-06-26]. .
11	Schulman J, Wolski F, Dhariwal P, et al. Proximal Policy Optimization Algorithms[EB/OL]. (2017-08-28) [2022-06-26]. .
12	Mnih V, Adrià Puigdomènech Badia, Mirza M, et al. Asynchronous Methods for Deep Reinforcement Learning[C]//Proceedings of the 33rd International Conference on International Conference on Machine Learning. Cambridge: JMLR, 2016: 1928-1937.
13	曹雷. 基于深度强化学习的智能博弈对抗关键技术[J]. 指挥信息系统与技术, 2019, 10(5): 1-7.
	Cao Lei. Key Technologies of Intelligent Game Confrontation Based on Deep Reinforcement Learning[J]. Command Information System and Technology, 2019, 10(5): 1-7.
14	孙长银, 穆朝絮. 多智能体深度强化学习的若干关键科学问题[J]. 自动化学报, 2020, 46(7): 1301-1312.
	Sun Changyin, Mu Chaoxu. Important Scientific Problems of Multi-agent Deep Reinforcement Learning[J]. Acta Automatica Sinica, 2020, 46(7): 1301-1312.
15	孙彧, 李清伟, 徐志雄, 等. 基于多智能体深度强化学习的空战博弈对抗策略训练模型[J]. 指挥信息系统与技术, 2021, 12(2): 16-20.
	Sun Yu, Li Qingwei, Xu Zhixiong, et al. Game Confrontation Strategy Training Model for Air Combat Based on Multi-agent Deep Reinforcement Learning[J]. Command Information System and Technology, 2021, 12(2): 16-20.

序号	作战力量	状态参数
1	作战单元(除投送编队外)69个	剩余数量
2	投送编队(除运输直升机外)3支	剩余数量，当前速度，出航状态
3	运输直升机编队3支	剩余数量，当前速度，出航状态，当前高度

序号	作战力量	动作空间
1	未行动的投送力量	[0,1] (0表示维持等待，1表示出航)
2	未行动的投送力量，但已决定在本时间步内行动	规模数量，速度，高度(直升机)
3	已行动的投送力量	速度，高度(直升机)

仿真时间/h	行动	仿真系统截图
0	战斗力量升空
2.0	两栖输送编队航渡
4.0	空中输送编队起飞
4.5	海上编队卸载空中编队空机降
5.0	上岛力量集结

[1]	Gu Shaozhu, Ying Yuxin, Zhang Huajie, Tong Yiqi. A Simulation Method Based on Multi-source Sensors for Aircraft Type Identification [J]. Journal of System Simulation, 2024, 36(1): 149-159.
[2]	Hu Mingwei, Yang Wenjie. Research on Campus Epidemic Evolution Based on Multi-scale Modeling and Simulation in Microscopic & Microscopic View [J]. Journal of System Simulation, 2024, 36(1): 170-182.
[3]	Zhong Jinghui, Lin Yutian, Li Wenqiang, Cai Wentong. Intelligent Airport Crowd Management Technology Based on Digital Twin [J]. Journal of System Simulation, 2024, 36(1): 27-38.
[4]	Luo Yucheng, Zhang Ming'en, Liu Fei, Lu Yingbo, Ye Feng. Result Validation Method of Simulation Models Based on Piecewise Feature Extraction [J]. Journal of System Simulation, 2024, 36(1): 272-281.
[5]	Peng Yong, Zhang Miao, Hu Yue. Cloud-Edge Collaborative Service Architecture for LVC Training System [J]. Journal of System Simulation, 2023, 35(9): 1825-1836.
[6]	Ma Shanzhi, Wang Hongliang, He Hua, Lun Weicheng. Research on Support Effectiveness Evaluation Method of Equipment Systems Based on PERT and ABMS [J]. Journal of System Simulation, 2023, 35(9): 1837-1846.
[7]	Zhang Fen, Yu Tao, Han Yong, He Longwei. Simulation Research on Multi-antenna Coupled Radiation of Launch Vehicle in Tower [J]. Journal of System Simulation, 2023, 35(9): 1847-1859.
[8]	Gong Jianxing, Wang Zimu, Yang Qilong. Training Simulation Scenario Generation Based on Particle Swarm Optimization [J]. Journal of System Simulation, 2023, 35(9): 1860-1870.
[9]	Chen Yuanyuan, Huai Yongjian, Nie Xiaoying, Lang Ke. 3D Garment Collision Simulation Based on Human Skeletal Features [J]. Journal of System Simulation, 2023, 35(9): 2023-2034.
[10]	Jiayi Liu, Gang Wang, Qiang Fu, Xiangke Guo, Siyuan Wang. Intelligent Air Defense Task Assignment Based on Assignment Strategy Optimization Algorithm [J]. Journal of System Simulation, 2023, 35(8): 1705-1716.
[11]	Yangyang Liu, Gangyi Ding, Dapeng Yan, Tong Xue. Real-time Simulation Method of Ultra-high-definition Video Texture [J]. Journal of System Simulation, 2023, 35(8): 1748-1756.
[12]	Liqiang Guo, Liang Ma, Hui Zhang, Jing Yang, Xueman Fan, Zhuo Cheng. Research on Modeling and Optimization Method of Torpedo Anti-jamming Attack Based on Game Confrontation [J]. Journal of System Simulation, 2023, 35(8): 1814-1823.
[13]	Tan Zhao, Lin Wu, JiuYang Tao, Shuai Li. Metaverse Concept and Its Military Application [J]. Journal of System Simulation, 2023, 35(7): 1405-1420.
[14]	Zhao Zhang, Yujie Guo, Xiaoning Zhao, Baoliang Sun, Shuanghou Deng, Guoxu Feng. Military Metaverse: Key Technologies, Potential Applications and Future Directions [J]. Journal of System Simulation, 2023, 35(7): 1421-1437.
[15]	Han Lu, Lin Zhang, Kunyu Wang, Zejun Huang, Hongbo Cheng, Jin Cui. A Framework on Equipment Digital Twin Credibility Assessment [J]. Journal of System Simulation, 2023, 35(7): 1455-1471.