系统仿真学报 ›› 2021, Vol. 33 ›› Issue (8): 1766-1774.doi: 10.16182/j.issn1004731x.joss.21-0432

• 专栏:智能协同对抗仿真 • 上一篇    下一篇

稀疏奖励下多航天器规避决策自学习仿真

赵毓, 郭继峰, 颜鹏, 白成超   

  1. 哈尔滨工业大学 航天学院,黑龙江 哈尔滨 150001
  • 收稿日期:2021-05-14 修回日期:2021-06-05 出版日期:2021-08-18 发布日期:2021-08-19
  • 通讯作者: 郭继峰(1977-),男,博士,教授,研究方向为无人系统智能自主技术。E-mail: guojifeng@hit.edu.cn
  • 作者简介:赵毓(1992-),男,博士生,研究方向为多智能体强化学习。E-mail: hitzhaoyu@hit.edu.cn
  • 基金资助:
    国家自然科学基金(61973101); 航空科学基金(20180577005)

Self-learning-based Multiple Spacecraft Evasion Decision Making Simulation Under Sparse Reward Condition

Zhao Yu, Guo Jifeng, Yan Peng, Bai Chengchao   

  1. School of Astronautics, Harbin Institute of Technology, Harbin 150001, China
  • Received:2021-05-14 Revised:2021-06-05 Online:2021-08-18 Published:2021-08-19

摘要: 为了提高航天器编队对多拦截器规避能力,针对传统程序式机动规避成功率低的问题,提出一种基于深度强化学习的多智能体协同自主规避决策方法。其中基于Actor-Critic架构设计了一种多智能体强化学习算法,为解决该自学习算法信度分配问题,提出加权线性拟合方法;对于任务场景稀疏奖励问题,提出基于逆值法的稀疏奖励强化学习方法。根据规避任务决策过程建立了空间多智能体对抗仿真系统,利用其验证了所提算法的正确性和有效性。

关键词: 多智能体, 强化学习, 稀疏奖励, 规避机动, 自主决策

Abstract: In order to improve the ability of spacecraft formation to evade multiple interceptors, aiming at the low success rate of traditional procedural maneuver evasion, a multi-agent cooperative autonomous decision-making algorithm, which is based on deep reinforcement learning method, is proposed. Based on the actor-critic architecture, a multi-agent reinforcement learning algorithm is designed, in which a weighted linear fitting method is proposed to solve the reliability allocation problem of the self-learning system. To solve the sparse reward problem in task scenario, a sparse reward reinforcement learning method based on inverse value method is proposed. According to the task scenario, the space multi-agent countermeasure simulation system is established, and the correctness and effectiveness of the proposed algorithm are verified.

Key words: multi-agent, reinforcement learning, sparse reward, evasion maneuver, autonomous decision making

中图分类号: