| [1] |
尹奇跃, 赵美静, 倪晚成, 等. 兵棋推演的智能决策技术与挑战[J]. 自动化学报, 2023, 49(5): 913-928.
|
|
Yin Qiyue, Zhao Meijing, Ni Wancheng, et al. Intelligent Decision Making Technology and Challenge of Wargame[J]. Acta Automatica Sinica, 2023, 49(5): 913-928.
|
| [2] |
罗俊仁, 张万鹏, 项凤涛, 等. 智能推演综述: 博弈论视角下的战术战役兵棋与战略博弈[J]. 系统仿真学报, 2023, 35(9): 1871-1894.
|
|
Luo Junren, Zhang Wanpeng, Xiang Fengtao, et al. Survey on Intelligent Wargaming: Tactical & Campaign Wargame and Strategic Game from Game-theoretic Perspective[J]. Journal of System Simulation, 2023, 35(9): 1871-1894.
|
| [3] |
Vinyals O, Ewalds T, Bartunov S, et al. StarCraft II: A New Challenge for Reinforcement Learning[EB/OL]. (2017-08-16) [2025-04-16]. .
|
| [4] |
中国科学院. 庙 算 ⋅ 陆 战 指 挥 官[EB/OL]. [2025-04-16]. .
|
| [5] |
Luo Haowen, Lee Chang-Hun, Li Chaoyong, et al. Generative Adversarial Imitation Learning-based Continuous Learning Computational Guidance[J]. IEEE Transactions on Aerospace and Electronic Systems, 2025, 61(3): 6809-6821.
|
| [6] |
Yu Lantao, Song Jiaming, Ermon S. Multi-agent Adversarial Inverse Reinforcement Learning[C]//Proceedings of the 36th International Conference on Machine Learning. Chia Laguna Resort: PMLR, 2019: 7194-7201.
|
| [7] |
Peng Yong, Zeng Junjie, Hu Yue, et al. Reinforcement Learning from Suboptimal Demonstrations Based on Reward Relabeling[J]. Expert Systems with Applications, 2024, 255, Part B: 124580.
|
| [8] |
Ross Stéphane, Gordon G J, Bagnell J A. A Reduction of Imitation Learning and Structured Prediction to No-regret Online Learning[C]//Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. Chia Laguna Resort: PMLR, 2011: 627-635.
|
| [9] |
Zhan E, Zheng S, Yue Yisong, et al. Generating Multi-agent Trajectories Using Programmatic Weak Supervision[EB/OL]. (2019-02-22) [2025-04-16]. .
|
| [10] |
Wang Hongwei, Yu Lantao, Cao Zhangjie, et al. Multi-agent Imitation Learning with Copulas[C]//Machine Learning and Knowledge Discovery in Databases. Research Track. Cham: Springer International Publishing, 2021: 139-156.
|
| [11] |
Oh J, Guo Yijie, Singh S, et al. Self-imitation Learning[EB/OL]. (2018-06-14) [2025-04-16]. .
|
| [12] |
Hao Peng, Lu Tao, Cui Shaowei, et al. SOZIL: Self-optimal Zero-shot Imitation Learning[J]. IEEE Transactions on Cognitive and Developmental Systems, 2023, 15(4): 2077-2088.
|
| [13] |
Lee Donghun, Park In-Beom, Kim Kwanho. A Self-imitation Learning Approach for Scheduling Evaporation and Encapsulation Stages of OLED Display Manufacturing Systems[J]. Robotics and Computer-Integrated Manufacturing, 2025, 93: 102917.
|
| [14] |
Tampuu Ardi, Matiisen Tambet, Kodelja Dorian, et al. Multiagent Cooperation and Competition with Deep Reinforcement Learning[J]. PLoS One, 2017, 12(4): e0172395.
|
| [15] |
Mnih V, Kavukcuoglu K, Silver D, et al. Human-level Control Through Deep Reinforcement Learning[J]. Nature, 2015, 518(7540): 529-533.
|
| [16] |
Gupta J K, Egorov M, Kochenderfer M. Cooperative Multi-agent Control Using Deep Reinforcement Learning[C]//International Conference on Autonomous Agents and Multiagent Systems. Cham: Springer International Publishing, 2017: 66-83.
|
| [17] |
Kraemer L, Banerjee B. Multi-agent Reinforcement Learning as a Rehearsal for Decentralized Planning[J]. Neurocomputing, 2016, 190: 82-94.
|
| [18] |
Sunehag P, Lever G, Gruslys A, et al. Value-decomposition Networks for Cooperative Multi-agent Learning[EB/OL]. (2017-06-16) [2025-04-16]. .
|
| [19] |
Rashid T, Samvelyan Mikayel, Christian Schroeder De Witt, et al. Monotonic Value Function Factorisation for Deep Multi-agent Reinforcement Learning[J]. The Journal of Machine Learning Research, 2020, 21(1): 178.
|
| [20] |
Wang Jianhao, Ren Zhizhou, Liu T, et al. QPLEX: Duplex Dueling Multi-agent Q-learning[EB/OL]. (2021-10-04) [2025-04-16]. .
|
| [21] |
Son Kyunghwan, Kim Daewoo, Wan Ju Kang, et al. QTRAN: Learning to Factorize with Transformation for Cooperative Multi-agent Reinforcement Learning[C]//Proceedings of the 36th International Conference on Machine Learning. Chia Laguna Resort: PMLR, 2019: 5887-5896.
|
| [22] |
Tang Hongyao, Hao Jianye, Tangjie Lü, et al. Hierarchical Deep Multiagent Reinforcement Learning with Temporal Abstraction[EB/OL]. (2019-07-04) [2025-04-16]. .
|
| [23] |
Rashid T, Farquhar G, Peng Bei, et al. Weighted QMIX: Expanding Monotonic Value Function Factorisation for Deep Multi-agent Reinforcement Learning[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 10199-10210.
|
| [24] |
Bernstein D S, Givan R, Immerman N, et al. The Complexity of Decentralized Control of Markov Decision Processes[J]. Mathematics of Operations Research, 2002, 27(4): 819-840.
|