系统仿真学报 ›› 2023, Vol. 35 ›› Issue (2): 423-434.doi: 10.16182/j.issn1004731x.joss.21-0879

• 论文 • 上一篇    

基于DQN的异构测控资源联合调度方法

薛乃阳1(), 丁丹2(), 贾玉童1, 王志强1, 刘渊3   

  1. 1.航天工程大学 研究生院, 北京 101416
    2.航天工程大学 电子与光学工程系, 北京 101416
    3.中国人民解放军61646部队, 北京 100192
  • 收稿日期:2021-08-31 修回日期:2021-10-11 出版日期:2023-02-28 发布日期:2023-02-16
  • 通讯作者: 丁丹 E-mail:2163628670@qq.com;ddnjr@163.com
  • 作者简介:薛乃阳(1997-),男,硕士生,研究方向为测控与通信技术。E-mail:2163628670@qq.com

DQN-based Joint Scheduling Method of Heterogeneous TT&C Resources

Naiyang Xue1(), Dan Ding2(), Yutong Jia1, Zhiqiang Wang1, Yuan Liu3   

  1. 1.Graduate School, Space Engineering University, Beijing 101416, China
    2.Department of Electronic and Optical Engineering, Space Engineering University, Beijing 101416, China
    3.PLA 61646 Troops, Beijing 100192, China
  • Received:2021-08-31 Revised:2021-10-11 Online:2023-02-28 Published:2023-02-16
  • Contact: Dan Ding E-mail:2163628670@qq.com;ddnjr@163.com

摘要:

以异构测控网资源联合调度为研究对象,提出一种基于强化学习的深度Q网络(deep Q network, DQN)算法在充分分析异构测控资源联合调度问题特点后,用数学语言对影响问题求解的约束条件进行描述,建立了资源联合调度模型;从应用强化学习解决问题的角度,对求解的问题进行马尔科夫决策过程描述后,分别设计了2个结构相同的神经网络和基于ε贪婪算法的动作选择策略,并建立了DQN求解框架。仿真结果表明:基于DQN的异构测控资源调度方法较遗传算法能够找到调度收益更优的测控调度方案。

关键词: 航天测控, 异构测控资源联合调度, 深度Q网络, 调度收益, 强化学习

Abstract:

Joint scheduling of heterogeneous TT&C resources as research object, a deep Q network (DQN) algorithm based on reinforcement learning is proposed. The characteristics of the joint scheduling problem of heterogeneous TT&C resources being fully analyzied and mathematical language being used to describe the constraints affecting the solution, a resource joint scheduling model is established. From the perspective of applying reinforcement learning, two neural networks with the same structure and the action selection strategies based onεgreedy algorithm are respectively designed after Markov decision process description, and DQN solution framework is established. The simulation results show that DQN-based heterogeneous TT&C resources scheduling method can identify a TT&C scheduling scheme with better scheduling revenue than the genetic algorithm.

Key words: telemetry, track and command (TT&C), joint scheduling of heterogeneous TT&C resources, deep Q network, scheduling revenue, reinforcement learning

中图分类号: