系统仿真学报 ›› 2024, Vol. 36 ›› Issue (7): 1670-1681.doi: 10.16182/j.issn1004731x.joss.23-0443

• 研究论文 • 上一篇    

基于深度强化学习的任务分析方法

龚雪1(), 彭鹏菲1, 荣里1(), 郑雅莲2, 姜俊1   

  1. 1.海军工程大学,湖北 武汉 430033
    2.武汉大学 水资源与水电工程科学国家重点实验室,湖北 武汉 430072
  • 收稿日期:2023-04-14 修回日期:2023-06-01 出版日期:2024-07-15 发布日期:2024-07-12
  • 通讯作者: 荣里 E-mail:gogxue@163.com;33574319@qq.com
  • 第一作者简介:龚雪(1998-),女,硕士生,研究方向为人工智能与大数据。Email:gogxue@163.com
  • 基金资助:
    国家重点研发计划(2017YFC1405205);海军工程大学科研发展基金自主立项项目(425317S107)

Task Analysis Methods Based on Deep Reinforcement Learning

Gong Xue1(), Peng Pengfei1, Rong Li1(), Zheng Yalian2, Jiang Jun1   

  1. 1.Naval University Of Engineering, Wuhan 430033, China
    2.State Key Laboratory of Water Resources and Hydropower Engineering Science, Wuhan University, Wuhan 430072, China
  • Received:2023-04-14 Revised:2023-06-01 Online:2024-07-15 Published:2024-07-12
  • Contact: Rong Li E-mail:gogxue@163.com;33574319@qq.com

摘要:

针对任务分析中任务协同交互耦合度高、影响因素繁多等问题,提出了基于序列解耦与深度强化学习的任务分析方法,实现了复杂约束条件下的任务分解及任务序列重构。设计了基于任务信息交互的深度强化学习环境,基于目标网络与评估网络损失函数间的差值改进SumTree算法,实现任务间的优先级评估;将激活函数运行机制引入深度强化学习网络,提取任务特征,提出贪婪激活因子,优化深度神经网络参数,确定智能体最优状态,从而进行智能体状态转换。通过经验回放生成多目标任务执行序列图。仿真实验结果表明,该方法能生成最佳调度下的可执行任务图;且相对于静态情景,该方法对动态情景有较好的自适应性,在领域任务筹划中具有良好的推广应用前景。

关键词: 任务分析, 强化学习, 评估网络, 贪婪因子, 耦合任务, 激活函数

Abstract:

In response to the high coupling of task interaction and many influencing factors in task analysis, a task analysis method based on sequence decoupling and deep reinforcement learning (DRL) is proposed, which can achieve task decomposition and task sequence reconstruction under complex constraints. The method designs an environment for deep reinforcement learning based on task information interaction, while improving the SumTree algorithm based on the difference between the loss functions of the target network and the evaluation network, achieving the priority evaluation among tasks. The activation function operation mechanism is introduced into the deep reinforcement learning network, followed by extracting the task features, putting forward the greedy activation factor, optimizing the parameters of the deep neural network, and determining the optimal state of the intelligent agent, thus facilitating its state transition. The multi-objective task execution sequence diagram is generated through experience replay. The simulation experiment results show that the method can generate executable task diagrams under optimal scheduling; and it has better adaptivity to dynamic scenarios compared with static scenarios, showing a promising prospect of widespread application in domain task planning.

Key words: task analysis, reinforcement learning, evaluation network, greedy factors, coupled tasks, activation functions

中图分类号: