稀疏奖励下多航天器规避决策自学习仿真

doi:10.16182/j.issn1004731x.joss.21-0432

系统仿真学报 ›› 2021, Vol. 33 ›› Issue (8): 1766-1774.doi: 10.16182/j.issn1004731x.joss.21-0432

• 专栏：智能协同对抗仿真 • 上一篇下一篇

稀疏奖励下多航天器规避决策自学习仿真

赵毓, 郭继峰, 颜鹏, 白成超

哈尔滨工业大学航天学院,黑龙江哈尔滨 150001

收稿日期:2021-05-14 修回日期:2021-06-05 出版日期:2021-08-18 发布日期:2021-08-19
通讯作者: 郭继峰(1977-),男,博士,教授,研究方向为无人系统智能自主技术。E-mail: guojifeng@hit.edu.cn
作者简介:赵毓(1992-),男,博士生,研究方向为多智能体强化学习。E-mail: hitzhaoyu@hit.edu.cn
基金资助:
国家自然科学基金(61973101); 航空科学基金(20180577005)

Self-learning-based Multiple Spacecraft Evasion Decision Making Simulation Under Sparse Reward Condition

Zhao Yu, Guo Jifeng, Yan Peng, Bai Chengchao

School of Astronautics, Harbin Institute of Technology, Harbin 150001, China

Received:2021-05-14 Revised:2021-06-05 Online:2021-08-18 Published:2021-08-19

摘要/Abstract

摘要： 为了提高航天器编队对多拦截器规避能力,针对传统程序式机动规避成功率低的问题,提出一种基于深度强化学习的多智能体协同自主规避决策方法。其中基于Actor-Critic架构设计了一种多智能体强化学习算法,为解决该自学习算法信度分配问题,提出加权线性拟合方法;对于任务场景稀疏奖励问题,提出基于逆值法的稀疏奖励强化学习方法。根据规避任务决策过程建立了空间多智能体对抗仿真系统,利用其验证了所提算法的正确性和有效性。

关键词: 多智能体, 强化学习, 稀疏奖励, 规避机动, 自主决策

Abstract: In order to improve the ability of spacecraft formation to evade multiple interceptors, aiming at the low success rate of traditional procedural maneuver evasion, a multi-agent cooperative autonomous decision-making algorithm, which is based on deep reinforcement learning method, is proposed. Based on the actor-critic architecture, a multi-agent reinforcement learning algorithm is designed, in which a weighted linear fitting method is proposed to solve the reliability allocation problem of the self-learning system. To solve the sparse reward problem in task scenario, a sparse reward reinforcement learning method based on inverse value method is proposed. According to the task scenario, the space multi-agent countermeasure simulation system is established, and the correctness and effectiveness of the proposed algorithm are verified.

Key words: multi-agent, reinforcement learning, sparse reward, evasion maneuver, autonomous decision making

中图分类号:

TP391.9

赵毓, 郭继峰, 颜鹏, 白成超. 稀疏奖励下多航天器规避决策自学习仿真[J]. 系统仿真学报, 2021, 33(8): 1766-1774.

Zhao Yu, Guo Jifeng, Yan Peng, Bai Chengchao. Self-learning-based Multiple Spacecraft Evasion Decision Making Simulation Under Sparse Reward Condition[J]. Journal of System Simulation, 2021, 33(8): 1766-1774.

参考文献

[1] 于大腾. 空间飞行器安全防护规避机动方法研究[D]. 长沙: 国防科技大学, 2017.
Yu Dateng.Approaches for the Spacecraft Security Defense and Evasion Maneuver Method[D]. Changsha: National University of Defense Technology, 2017.
[2] 司玉洁, 熊华, 李喆. 拦截机动目标的三维自适应神经网络制导律[J]. 系统仿真学报, 2021, 33(2): 453-460.
Si Yujie, Xiong Hua, Li Zhe.Three-dimensional Adaptive Neural Network Guidance Law against Maneuvering Targets[J]. Journal of System Simulation, 2021, 33(2): 453-460.
[3] Shinar J, Steinberg D.Analysis of Optimal Evasive Maneuvers Based on a Linearized Two Dimensional Kinematic Model[J]. Journal of Aircraft (S0021-8669), 1977, 14(8): 546-554.
[4] 汪民乐. 战略导弹突防仿真模型[J]. 系统工程与电子技术, 1996, 18(10): 53-58.
Wang Minle.Simulating Model of Strategic Missile Penetration[J]. Systems Engineering and Electronics, 1996, 18(10): 53-58.
[5] 张润德, 蔡伟伟, 杨乐平. 基于微分平坦的航天器避障轨迹快速规划[J]. 飞行力学, 2020, 38(4): 65-70.
Zhang Runde, Cai Weiwei, Yang Leping.Differential Flatness Based Rapid Trajectory Planning for Spacecraft Obstacle Avoidance[J]. Flight Dynamics, 2020, 38(4): 65-70.
[6] 李翠兰, 欧阳琦, 陈明, 等. 大型低轨航天器与星座卫星的碰撞风险研究[J]. 宇航学报, 2020, 41(9): 1158-1165.
Li Cuilan, Ouyang Qi, Chen Ming, et al.Analysis of Collision Risk Between Constellation Satellites and Large Low-Orbit Spacecraft[J]. Journal of Astronautics, 2020, 41(9): 1158-1165.
[7] Gupta J K, Egorov M, Kochenderfer M.Cooperative Multi-agent Control Using Deep Reinforcement Learning[C]// AAMAS 2017. Lecture Notes in Computer Science. São Paulo: Springer, Cham, 2017.
[8] Yu C, Velu A, Yinitsky E, et al. The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games[J/OL]. ArXiv preprint, (2021-03-02) [2021-03-30]. https://arxiv.org/abs/2103.01955.
[9] 周建频, 张姝柳. 基于深度强化学习的动态库存路径优化[J]. 系统仿真学报, 2019, 31(10): 2155-2163.
Zhou Jianpin, Zhang Shuliu.Dynamic Inventory Routing Optimization Based on Deep Reinforcement Learning[J]. Journal of System Simulation, 2019, 31(10): 2155-2163.
[10] 杨惟轶, 白辰甲, 蔡超, 等. 深度强化学习中稀疏奖励问题研究综述[J]. 计算机科学, 2020, 47(3): 182-191.
Yang Weiyi, Bai Chenjia, Cai Chao, et al.Survey on Sparse Reward in Deep Reinforcement Learning[J]. Computer Science, 2020, 47(3): 182-191.
[11] 杨瑞, 严江鹏, 李秀. 强化学习稀疏奖励算法研究——理论与实验[J]. 智能系统学报, 2020, 15(5): 888-899.
Yang Rui, Yan Jiangpeng, Li Xiu.Survey of Sparse Reward Algorithms in Reinforcement Learning — Theory and Experiment[J]. CAAI Transactions on Intelligent Systems, 2020, 15(5): 888-899.
[12] Feinberg V, Wan A, Stoica I, et al. Model-based Value Estimation for Efficient Model-free Reinforcement Learning[J/OL]. ArXiv preprint, (2018-02-28) [2021-04-02]. https://arxiv.org/abs/1803.00101.
[13] Leal M A, Baker T L, Pflibsen K P. Multiple Kill Vehicle Interceptor with Autonomous Kill Vehicles: US, US7494090B2[P].2009-02-24.

[1]	陆淼嘉, 黄承媛, 滕靖. 基于多智能体的网购生鲜无人车配送调度仿真[J]. 系统仿真学报, 2022, 34(6): 1185-1195.
[2]	赵也践, 王艳红, 张俊, 于洪霞, 田中大. 改进Q学习算法在作业车间调度问题中的应用[J]. 系统仿真学报, 2022, 34(6): 1247-1258.
[3]	张森, 张孟炎, 邵敬平, 普杰信. 基于随机策略搜索的多机三维路径规划方法[J]. 系统仿真学报, 2022, 34(6): 1286-1295.
[4]	倪凌佳, 黄晓霞, 李红旮, 张子博. 基于协作式深度强化学习的火灾应急疏散仿真研究[J]. 系统仿真学报, 2022, 34(6): 1353-1366.
[5]	王红微, 杨鹏. 基于深度强化学习的机场货运业务优化研究[J]. 系统仿真学报, 2022, 34(3): 651-660.
[6]	王霄汉, 张霖, 赖李媛君, 谢堃钰, 胡听春. 基于DEVS原子模型的智能体离散仿真构建方法[J]. 系统仿真学报, 2022, 34(2): 191-200.
[7]	李锋, 魏莹. 社会学习和参照点效应对企业产品决策的影响[J]. 系统仿真学报, 2022, 34(2): 234-246.
[8]	李启锐, 彭心怡. 基于深度强化学习的云作业调度及仿真研究[J]. 系统仿真学报, 2022, 34(2): 258-268.
[9]	吴曦, 孟祥林, 杨镜宇. 下一代战略博弈推演系统研究[J]. 系统仿真学报, 2021, 33(9): 2017-2024.
[10]	桂欣冬, 吉鸿江, 范玲玲, 刘世达. 信任驱动的自适应协调控制算法及应用[J]. 系统仿真学报, 2021, 33(8): 1809-1817.
[11]	李锋, 魏莹. 复杂网络对羊群效应现象影响的仿真研究[J]. 系统仿真学报, 2021, 33(3): 539-553.
[12]	高昂, 董志明, 张国辉, 梁涛, 郭齐胜. LVC训练系统中计算机生成兵力生成技术研究[J]. 系统仿真学报, 2021, 33(3): 745-752.
[13]	王霄汉, 张霖, 任磊, 谢堃钰, 王昆玉, 叶飞, 陈真. 基于强化学习的车间调度问题研究简述[J]. 系统仿真学报, 2021, 33(12): 2782-2791.
[14]	曾贲, 房霄, 孔德帅, 宋祥祥, 贾政轩, 林廷宇. 一种数据驱动的对抗博弈智能体建模方法[J]. 系统仿真学报, 2021, 33(12): 2838-2845.
[15]	马骏, 杨镜宇, 吴曦. 基于人机混合智能的联合作战仿真实验方法研究[J]. 系统仿真学报, 2021, 33(10): 2323-2334.

稀疏奖励下多航天器规避决策自学习仿真

Self-learning-based Multiple Spacecraft Evasion Decision Making Simulation Under Sparse Reward Condition

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价