一种基于DRL的分布式装备体系优选方法

doi:10.16182/j.issn1004731x.joss.24-0222

摘要/Abstract

摘要：

针对传统算法在大规模场景中求解速度不足且适应性较差的问题，基于DRL对大规模分布式装备体系优选问题进行智能化求解。根据分布式装备体系作战的特点，利用复杂网络对其进行图形式建模，并基于注意力机制对装备间的连边关系进行表征，构建分布式装备体系数字仿真环境。仿真结果表明：与遗传进化算法相比，该模型在求解时间、适应性等方面优势明显，有效提高了大规模分布式装备体系优选决策模型的性能。

关键词: DRL, 图神经网络, 注意力机制, 复杂网络, 分布式装备体系

Abstract:

Aiming at the problem of insufficient solution speed and poor generalization of traditional algorithms in large-scale scenarios, this paper intelligently solves the large-scale distributed equipment system preference problem based on deep reinforcement learning. According to the characteristics of distributed equipment system combat, using the complex network to its graph form modeling, and based on the attention mechanism to the equipment between the connecting edge relationship for the characterization, in order to build a distributed equipment system digital simulation environment. Simulation results show that compared with the genetic evolutionary algorithm, the obtained model has obvious advantages in terms of solution time and generalization, which effectively improves the performance of distributed equipment nodes combination selection.

Key words: DRL, graph neural network, attention mechanism, complex network, distributed system of equipment

中图分类号:

TP391.9

王子怡,张凯,钱殿伟等 . 一种基于DRL的分布式装备体系优选方法[J]. 系统仿真学报, 2025, 37(6): 1565-1573.

Wang Ziyi,Zhang Kai,Qian Dianwei,et al . A DRL⁃based Approach for Distributed Equipment Nodes Selection[J]. Journal of System Simulation, 2025, 37(6): 1565-1573.

图/表 12

表1

图1

图2

图3

图4

图5

表2

表3

算法训练超参数

超参数	值	超参数	值
训练总次数	100	学习率	$1 × 10 - 4$
训练批次	640	注意力头数	8
样本数量	64 000	隐藏层维度	128
编码器层数	3

表3

图6

表4

表5

表6

参考文献 19

1	赵仁星, 王玲, 冯明月, 等. "马赛克战"作战概念构想及对策分析[J]. 空天防御, 2021, 4(3): 48-54.
	Zhao Renxing, Wang Ling, Feng Mingyue, et al. Analysis of Combat Concept of "Mosaic Warfare" and Countermeasures[J]. Air & Space Defense, 2021, 4(3): 48-54.
2	郭行, 符文星, 闫杰. 浅析美军马赛克战作战概念及启示[J]. 无人系统技术, 2020, 3(6): 92-106.
	Guo Hang, Fu Wenxing, Yan Jie. Analysis and Inspiration of the U.S. Force's Concept of Mosaic Warfare[J]. Unmanned Systems Technology, 2020, 3(6): 92-106.
3	邵晨曦, 陈小齐, 王行甫, 等. 基于数据场的复杂网络节点影响力建模与仿真[J]. 系统仿真学报, 2020, 32(7): 1257-1266.
	Shao Chenxi, Chen Xiaoqi, Wang Xingfu, et al. Modeling and Simulation on Influence of Complex Network Nodes Based on Data Field in[J]. Journal of System Simulation, 2020, 32(7): 1257-1266.
4	夏博远, 杨克巍, 杨志伟, 等. 基于杀伤网评估的装备组合多目标优化[J]. 系统工程与电子技术, 2021, 43(2): 399-409.
	Xia Boyuan, Yang Kewei, Yang Zhiwei, et al. Multi-objective Optimization of Equipment Portfolio Based on Kill-web Evaluation[J]. Systems Engineering and Electronics, 2021, 43(2): 399-409.
5	Li Yaodong, Xiao Bing, Yu Jingtao. Modeling and Analysis of Mosaic Warfare System Based on Complex Network Theory[C]//International Conference on Intelligent Systems, Communications, and Computer Networks (ISCCN 2022). Bellingham: SPIE, 2022: 1233223.
6	李凯文, 张涛, 王锐, 等. 基于DRL的组合优化研究进展[J]. 自动化学报, 2021, 47(11): 2521-2537.
	Li Kaiwen, Zhang Tao, Wang Rui, et al. Research Reviews of Combinatorial Optimization Methods Based on Deep Reinforcement Learning[J]. Acta Automatica Sinica, 2021, 47(11): 2521-2537.
7	Vinyals O, Fortunato M, Jaitly N. Pointer Networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2015: 2692-2700.
8	Bello I, Pham H, Le Q V, et al. Neural Combinatorial Optimization with Reinforcement Learning[EB/OL]. (2017-01-12) [2023-12-22]. .
9	Vaswani A, Shazeer N, Parmar N, et al. Attention Is All You Need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 6000-6010.
10	Kool Wouter, van Hoof Herke, Welling Max. Attention, Learn to Solve Routing Problems![EB/OL]. (2019-02-07) [2023-12-22]. .
11	Peng Yun, Choi B, Xu Jianliang. Graph Learning for Combinatorial Optimization: A Survey of State-of-the-art[J]. Data Science and Engineering, 2021, 6(2): 119-141.
12	Scarselli Franco, Gori Marco, Ah Chung Tsoi, et al. The Graph Neural Network Model[J]. IEEE Transactions on Neural Networks, 2009, 20(1): 61-80.
13	Kipf Thomas N, Welling Max. Semi-supervised Classification with Graph Convolutional Networks[EB/OL]. (2017-02-22) [2023-12-22]. .
14	Veličković Petar, Cucurull Guillem, Casanova Arantxa, et al. Graph Attention Networks[EB/OL]. (2018-02-04) [2023-12-22]. .
15	Ma Qiang, Ge Suwen, He Danyang, et al. Combinatorial Optimization by Graph Pointer Networks and Hierarchical Reinforcement Learning[C]//Proceedings of the 1st International Workshop on Deep Learning on Graphs: Methodologies and Applications. New York: AAAI, 2020.
16	Lei Kun, Guo Peng, Wang Yi, et al. Solve Routing Problems with a Residual Edge-graph Attention Neural Network[J]. Neurocomputing, 2022, 508: 79-98.
17	王耀祖, 尚柏林, 宋笔锋, 等. 基于杀伤链的作战体系网络关键节点识别方法[J]. 系统工程与电子技术, 2023, 45(3): 736-744.
	Wang Yaozu, Shang Bailin, Song Bifeng, et al. Identification Method of Key Node in Operational System-of-systems Network Based on Kill Chain[J]. Systems Engineering and Electronics, 2023, 45(3): 736-744.
18	Hu Yaru, Zheng Jinhua, Zou Juan, et al. Dynamic Multi-objective Optimization Algorithm Based Decomposition and Preference[J]. Information Sciences, 2021, 571: 175-190.
19	Jiang Jing, Han Fei, Ling Qinghua, et al. Efficient Network Architecture Search via Multiobjective Particle Swarm Optimization Based on Decomposition[J]. Neural Networks, 2020, 123: 305-316.

连边类型	具体说明
目标节点-侦察节点	目标层与侦察层间连边，表示侦察装备对目标具备侦察能力
侦察节点-指控节点	侦察层与指控层间连边，表示侦察装备能将信息发送至指挥所
指控节点-攻击节点	指控层与攻击层间连边，表示指挥所能将指令下达至装备
攻击节点-目标节点	攻击层与目标层间连边，表示此装备具备攻击目标节点的能力

算法	100节点		150节点		200节点
算法	总指标值	t/s	总指标值	t/s	总指标值	t/s
GAT-RL	0.236 1	0.25	0.224 3	0.31	0.247 9	0.60
GA	0.269 6	751.07	0.326 8	1 763.26	0.389 3	2 892.24

总节点	目标节点	侦察节点	指控节点	攻击节点
90	5	45	10	30
110	5	55	10	40

算法	90节点		110节点		150节点		200节点
算法	总指标值	t/s	总指标值	t/s	总指标值	t/s	总指标值	t/s
GAT-RL	0.261 3	0.60	0.255 6	0.59	0.224 3	0.31	0.247 9	0.60
100节点模型泛化	0.255 4	0.58	0.254 2	0.60	0.236 9	0.26	0.257 1	0.60
GA	0.266 8	515.95	0.280 1	797.53	0.326 8	1 763.26	0.389 3	2 892.24

[1]	江明, 何韬. 基于深度强化学习的带容量约束车辆路径问题求解[J]. 系统仿真学报, 2025, 37(9): 2177-2187.
[2]	姜彦吉, 张颖阳, 董浩, 张晓光, 王美惠. 基于实例关联的暗光下车道线检测[J]. 系统仿真学报, 2025, 37(9): 2188-2199.
[3]	马仑, 杨跃, 王迨贺, 廖桂生, 李幸. 联合自注意力机制与权值共享的人体行为识别模型[J]. 系统仿真学报, 2025, 37(9): 2409-2419.
[4]	鲁斌, 杨烜, 杨振宇, 高啸天. 自适应采样与重影多尺度特征融合的轻量化焊缝缺陷检测[J]. 系统仿真学报, 2025, 37(8): 1978-1990.
[5]	刘子龙, 张磊. 自然环境下改进YOLOv5对小目标苹果的检测[J]. 系统仿真学报, 2025, 37(8): 2124-2138.
[6]	吕金虎, 蒋弘毅, 刘德元, 谭少林. 基于图神经网络的复杂系统建模与仿真[J]. 系统仿真学报, 2025, 37(7): 1624-1638.
[7]	伍国华, 曾家恒, 王得志, 郑龙, 邹伟. 基于深度强化学习的四旋翼航迹跟踪控制方法[J]. 系统仿真学报, 2025, 37(5): 1169-1187.
[8]	王祥, 谭国真. 基于知识与大语言模型的高速环境自动驾驶决策研究[J]. 系统仿真学报, 2025, 37(5): 1246-1255.
[9]	李杰, 刘扬, 李良, 苏本淦, 魏佳隆, 周广达, 石艳敏, 赵振. 基于跨阶段双分支特征聚合的遥感小目标检测[J]. 系统仿真学报, 2025, 37(4): 1025-1040.
[10]	聂士达, 李成兵, 贺博威, 李鑫涛. 考虑城市内部服务网络的城市群综合客运网络抗毁性研究[J]. 系统仿真学报, 2025, 37(4): 933-942.
[11]	郑岚月, 张玉洁. 基于改进YOLOv7的交通信号灯检测[J]. 系统仿真学报, 2025, 37(4): 993-1007.
[12]	李想, 任晓羽, 周永兵, 张剑. 基于改进D3QN算法的随机工时下柔性综合调度问题研究[J]. 系统仿真学报, 2025, 37(2): 474-486.
[13]	费帅迪, 蔡长龙, 刘飞, 陈明晖, 刘晓明. 舰船防空反导的目标分配方法研究[J]. 系统仿真学报, 2025, 37(2): 508-516.
[14]	李孝斌, 胡冰, 尹超, 李波, 马军. 基于时空图卷积的汽车配件供应链需求预测与仿真分析[J]. 系统仿真学报, 2025, 37(12): 3060-3074.
[15]	张文康, 孙霄峰, 钟一平, 尹勇. 基于图神经网络的船舶液舱晃荡数值仿真[J]. 系统仿真学报, 2025, 37(12): 3087-3099.