系统仿真学报 ›› 2025, Vol. 37 ›› Issue (5): 1142-1157.doi: 10.16182/j.issn1004731x.joss.24-0045

• 综述 • 上一篇    下一篇

智能博弈决策大模型智能体技术综述

谷学强, 罗俊仁, 周棪忠, 张万鹏   

  1. 国防科技大学 智能科学学院,湖南 长沙 410073
  • 收稿日期:2024-01-12 修回日期:2024-05-30 出版日期:2025-05-20 发布日期:2025-05-23
  • 通讯作者: 罗俊仁
  • 第一作者简介:谷学强(1983-),男,副研究员,博士,研究方向为智能规划与决策,边缘智能。
  • 基金资助:
    国家自然科学基金(61806212)

Survey on Large Language Agent Technologies for Intelligent Game Theoretic Decision-making

Gu Xueqiang, Luo Junren, Zhou Yanzhong, Zhang Wanpeng   

  1. College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China
  • Received:2024-01-12 Revised:2024-05-30 Online:2025-05-20 Published:2025-05-23
  • Contact: Luo Junren

摘要:

人工智能技术的发展极大推动了智能博弈决策问题求解范式的变革,从最优解、均衡解到适变解,如何构建基于生成式大模型的智能博弈自适应决策智能体充满挑战。博弈强对抗环境中兵力分配和多实体协同是研究排兵布阵和作战协同的核心课题。基于技能、排序和偏好元博弈模型构建的策略强化学习、策略博弈树搜索与策略偏好投票选择方法,设计了满足生成时规划的大模型智能体架构。该架构可对齐指挥员意图,具有可行性、适用性、扩展性,可为自适应决策过程提供可解释性策略推荐。从基座模型构建、目标引导博弈强化学习和开放式元博弈策略学习分析了关键技术需求。期望为强化学习类模型、博弈学习类模型与生成式大语言模型结合的交叉研究提供参考。

关键词: 自适应, 兵力分配, 多实体协同, 多智能体强化学习, 元博弈, 大语言模型, 思维链

Abstract:

The development of artificial intelligence technology has greatly promoted the transformation of the solving paradigm of intelligent game decision problems. From optimal solution, equilibrium solution to adaptive variable solution, how to build an intelligent game adaptive decision agent based on generative large model is full of challenges. The force distribution and multi-entity coordination in the game strong confrontation environment are the core issues in the study of troop deployment and operational coordination. Based on the methods of strategy reinforcement learning, strategy game tree search and strategy preference voting based on skill, ranking and preference meta-game model construction, a large model agent architecture is designed to meet the planning at generation time. The architecture can align the commander's intention with feasibility, applicability and extensibility, and can provide interpretable strategy recommendation for adaptive decision-making process. Key technical requirements are analyzed from the base model construction, goal-guided game reinforcement learning and open meta-game strategy learning. It is expected to provide reference for the cross-research of reinforcement learning model, game learning model and generative large language model.

Key words: adaptive, force allocation, multi-entity coordination, multi-agent reinforcement learning, meta-game, large language model, chain of thought

中图分类号: