云原生仿真驱动的智能竞赛平台与模式

doi:10.16182/j.issn1004731x.joss.24-0922

摘要/Abstract

摘要：

为解决智能体对抗竞赛模式存在开发部署困难、资源利用率低、可复用性差、难以接入强化学习算法等问题，设计了一种新型的智能体仿真训练平台。基于云原生技术解耦了竞赛平台软件组成要素；提出了一种面向竞赛环境的高性能仿真引擎；设计了一种新的智能控制端嵌入强化学习模型方法，内置多种在线策略、离线策略强化学习算法。实验表明：系统开发部署高效，降低了对参赛者硬件设备的要求，且参赛者不需要安装任何应用，一键登录系统；系统运行高效，通过云服务化设计，有效解决并发能力弱、可靠性差、响应延迟等问题；系统与智能训练算法适配度高并支持用户对训练、推演模块和主要参数进行调节，降低参赛者智能化训练门槛。

关键词: 互联网模式, 云原生, 强化学习, B/S架构, 组件化建模

Abstract:

To solve the problems faced by the adversarial competition mode of agents, including difficult development and deployment, low resource utilization, poor reusability, and difficulty in accessing reinforcement learning algorithms, a new agent simulation training platform was designed. The software components of the competition platform were decoupled based on cloud-native technology; a high-performance simulation engine for the competition environment was proposed; a new method of an embedded reinforcement learning model for an intelligent control terminal was designed, with multiple online and offline policy-based reinforcement learning algorithms set.The experiment demonstrates that the development and deployment of the system is efficient, which reduces the requirements for the hardware equipment of the participants, and the participants can log in to the system via one click without installing any applications. The system runs efficiently and effectively solves problems such as weak concurrency, poor reliability, and response delay through cloud service design. The system is highly adapted to the intelligent training algorithm and supports users to adjust the training, deduction modules, and main parameters, so as to reduce the threshold of intelligent training for participants.

Key words: internet mode, cloud-native, RL, B/S architecture, component-based modeling

中图分类号:

TP391.9

秦龙,黄鹤松,尹路珈等 . 云原生仿真驱动的智能竞赛平台与模式[J]. 系统仿真学报, 2026, 38(4): 988-1003.

Qin Long,Huang Hesong,Yin Lujia,et al . Intelligent Competition Platform and Mode Driven by Cloud-native Simulation[J]. Journal of System Simulation, 2026, 38(4): 988-1003.

图/表 26

图1

图2

图3

图4

图5

图6

图7

表1

图8

图9

图10

图11

图12

图13

图14

图15

图16

表2

表3

表4

表5

表6

图17

图18

表7

图19

参考文献 17

[1]	Zhang Jiandong, Wang Dinghan, Yang Qiming, et al. Loyal Wingman Task Execution for Future Aerial Combat: A Hierarchical Prior-based Reinforcement Learning Approach[J]. Chinese Journal of Aeronautics, 2024, 37(5): 462-481.
[2]	李松, 麻壮壮, 张蕴霖, 等. 基于安全强化学习的多智能体覆盖路径规划[J]. 兵工学报, 2023, 44(增2): 101-113.
	Li Song, Ma Zhuangzhuang, Zhang Yunlin, et al. Multi-agent Coverage Path Planning Based on Security Reinforcement Learning[J]. Acta Armamentarii, 2023, 44(S2): 101-113.
[3]	罗俊仁, 张万鹏, 项凤涛, 等. 智能推演综述:博弈论视角下的战术战役兵棋与战略博弈[J]. 系统仿真学报, 2023, 35(9): 1871-1894.
	Luo Junren, Zhang Wanpeng, Xiang Fengtao, et al. Survey on Intelligent Wargaming: Tactical & Campaign Wargame and Strategic Game from Game-theoretic Perspective[J]. Journal of System Simulation, 2023, 35(9): 1871-1894.
[4]	王龙, 黄锋. 多智能体博弈、学习与控制[J]. 自动化学报, 2023, 49(3): 580-613.
	Wang Long, Huang Feng. An Interdisciplinary Survey of Multi-agent Games, Learning, and Control[J]. Acta Automatica Sinica, 2023, 49(3): 580-613.
[5]	黄彬城, 陈思, 高放, 等. 星际争霸视角的未来作战自主决策技术[J]. 科技导报, 2021, 39(5): 117-125.
	Huang Bincheng, Chen Si, Gao Fang, et al. On Future Combat Autonomous Decision Technology for Starcraft[J]. Science & Technology Review, 2021, 39(5): 117-125.
[6]	朱冰, 汤瑞, 赵健, 等. 基于代理遗传优化的智能驾驶系统加速测试方法[J]. 同济大学学报(自然科学版), 2024, 52(4): 501-511.
	Zhu Bing, Tang Rui, Zhao Jian, et al. Accelerated Test Method of Intelligent Driving System Based on Surrogate Genetic Optimization Model[J]. Journal of Tongji University(Natural Science), 2024, 52(4): 501-511.
[7]	孙宇祥, 彭益辉, 李斌, 等. 智能博弈综述:游戏AI对作战推演的启示[J]. 智能科学与技术学报, 2022, 4(2): 157-173.
	Sun Yuxiang, Peng Yihui, Li Bin, et al. Overview of Intelligent Game: Enlightenment of Game AI to Combat Deduction[J]. Chinese Journal of Intelligent Science and Technology, 2022, 4(2): 157-173.
[8]	成城, 陈智杰, 郭子铭, 等. 多智能体协同决策仿真平台研究与开发[J]. 系统仿真学报, 2023, 35(12): 2669-2679.
	Cheng Cheng, Chen Zhijie, Guo Ziming, et al. Research and Development of Simulation Training Platform for Multi-agent Collaborative Decision-making[J]. Journal of System Simulation, 2023, 35(12): 2669-2679.
[9]	卢锐轩, 孙莹, 杨奇, 等. 基于人工智能技术的智能自博弈平台研究[J]. 战术导弹技术, 2019(2): 47-52, 98.
	Lu Ruixuan, Sun Ying, Yang Qi, et al. Research on Intelligent Self-game Platform Based on Artificial Intelligence Technology[J]. Tactical Missile Technology, 2019(2): 47-52, 98.
[10]	王庆达. 基于云平台的火炮训练仿真系统研究[D]. 济南: 山东建筑大学, 2023.
	Wang Qingda. Research on Artillery Training Simulation System Based on Cloud Platform[D]. Ji'nan: Shandong Jianzhu University, 2023.
[11]	林健, 黄林, 黄进军, 等. 高性能云原生大数据平台设计与实现[J]. 软件导刊, 2024, 23(3): 99-106.
	Lin Jian, Huang Lin, Huang Jinjun, et al. Design and Implementation of a High-performance Cloud-native Big Data Platform[J]. Software Guide, 2024, 23(3): 99-106.
[12]	李亮. 云原生应用开发与部署面临的挑战及其应对方案[J]. 软件工程, 2024, 27(1): 6-9.
	Li Liang. Challenges and Solutions for Cloud Native Application Development and Deployment[J]. Software Engineering, 2024, 27(1): 6-9.
[13]	侯国超. 作战仿真实体模型组件化构建方法设计[J]. 舰船电子对抗, 2021, 44(4): 48-52.
	Hou Guochao. Design of Componentization Construction Method for Combat Simulation Entity Model[J]. Shipboard Electronic Countermeasure, 2021, 44(4): 48-52.
[14]	阴丽红. 基于组件化建模的水下仿真控制演练系统设计及研究[D]. 西安: 西安电子科技大学, 2022.
	Yin Lihong. Design and Research of Under Water Simulation Control Drill System Based on Component Modeling[D]. Xi'an: Xidian University, 2022.
[15]	曹琦, 向群, 王文政. 后装保障仿真实体组件化建模研究[J]. 系统仿真学报, 2021, 33(6): 1233-1240.
	Cao Qi, Xiang Qun, Wang Wenzheng. Research on Component-based Modeling of Simulation Entity for Logistics and Equipment Support[J]. Journal of System Simulation, 2021, 33(6): 1233-1240.
[16]	Iqbal S, Costales R, Sha Fei. ALMA: Hierarchical Learning for Composite Multi-agent Tasks[C]//36th Conference on Neural Information Processing Systems (NeurIPS 2022). Red Hook: Curran Associates, Inc., 2022: 7155-7166.
[17]	Sivagnanam A, Pettet A, Lee H, et al. Multi-agent Reinforcement Learning with Hierarchical Coordination for Emergency Responder Stationing[C]//Proceedings of the 41st International Conference on Machine Learning. Cambridge: JMLR, 2024: 45813-45834.

类型	名称	描述
事件	SendScheduleEvent	向引擎发送规划事件
事件	自定义	事件调用
交互	ReceiveInteraction	接收交互
交互	SendInteraction	发送交互
属性	UpdateObject	更新实体属性
属性	QueryObjects	查询实体属性
服务	RequestforService	请求扩展仿真服务

组数	原同步框架耗时	优化后异步框架耗时
平均值	56.69	32.79
1	56.72	33.36
2	57.15	32.13
3	56.21	32.89

组数	原同步框架耗时	优化后异步框架耗时
平均值	407.56	391.09
1	408.49	390.91
2	404.25	392.84
3	409.93	389.52

测试条件	网络带宽/Mbps	平均倍率	最大倍率	最小倍率
千兆宽带	1 000	151.3	201.1	148.5
万兆宽带	10 000	153.4	197.8	144.3
5G	300	152.4	206.3	147.5
4G	40	151.2	196.7	143.1

分类	特征名称	取值范围	维度
位置/m	东西坐标	(0, 20 000)	1
	南北坐标	(0, 20 000)	1
	海拔高度	(0, 100)	1
状态信息	朝向/(°)	(0, 360)	1
	速度/(m/s)	(-3, 3)	3
	血量	(0, 100)	1
	弹药数量	(0, 3)	1
	武器剩余冷却时间/s	(0, 10)	1
	身体姿态(匍匐、站立)	0, 1	1
侦察信息	侦察半径/m	(0, 1 000)	1
	视线夹角/(°)	(0, 120)	1
	视线方向(x, y, z)	(-1, 1)	3
	侦察结果	结构体	9