Task Analysis Methods Based on Deep Reinforcement Learning

doi:10.16182/j.issn1004731x.joss.23-0443

Abstract

Abstract:

In response to the high coupling of task interaction and many influencing factors in task analysis, a task analysis method based on sequence decoupling and deep reinforcement learning (DRL) is proposed, which can achieve task decomposition and task sequence reconstruction under complex constraints. The method designs an environment for deep reinforcement learning based on task information interaction, while improving the SumTree algorithm based on the difference between the loss functions of the target network and the evaluation network, achieving the priority evaluation among tasks. The activation function operation mechanism is introduced into the deep reinforcement learning network, followed by extracting the task features, putting forward the greedy activation factor, optimizing the parameters of the deep neural network, and determining the optimal state of the intelligent agent, thus facilitating its state transition. The multi-objective task execution sequence diagram is generated through experience replay. The simulation experiment results show that the method can generate executable task diagrams under optimal scheduling; and it has better adaptivity to dynamic scenarios compared with static scenarios, showing a promising prospect of widespread application in domain task planning.

Key words: task analysis, reinforcement learning, evaluation network, greedy factors, coupled tasks, activation functions

CLC Number:

E917

Gong Xue, Peng Pengfei, Rong Li, Zheng Yalian, Jiang Jun. Task Analysis Methods Based on Deep Reinforcement Learning[J]. Journal of System Simulation, 2024, 36(7): 1670-1681.

Figures/Tables 11

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Table 1

Fig. 5

Fig. 6

Fig. 7

Fig. 8

Fig. 9

Fig. 10

References 33

1	马悦, 吴琳, 刘昀, 等. 作战任务优选建模及求解方法研究[J]. 系统仿真学报, 2023, 35(3): 470-483.
	Ma Yue, Wu Lin, Liu Yun, et al. Research on Modeling and Solution Method of Operational Tasks Optimization[J]. Journal of System Simulation, 2023, 35(3): 470-483.
2	贾正荣, 卢发兴, 王航宇. 基于解耦优化和环流APF的多平台协同攻击任务规划[J]. 北京航空航天大学学报, 2020, 46(6): 1142-1150.
	Jia Zhengrong, Lu Faxing, Wang Hangyu. Multi-platform Cooperative Task Planning with Decoupling Optimization and Circulating APF[J]. Journal of Beijing University of Aeronautics and Astronautics, 2020, 46(6): 1142-1150.
3	王晨旭, 王晓晨, 余敦辉, 等. 基于动态解耦的软件众包任务分解算法[J]. 计算机工程, 2019, 45(8): 120-124, 134.
	Wang Chenxu, Wang Xiaochen, Yu Dunhui, et al. Software Crowdsourcing Task Decomposition Algorithm Based on Dynamic Decoupling[J]. Computer Engineering, 2019, 45(8): 120-124, 134.
4	杨伟刚, 张永永. 2020年以来美国国民警卫队遂行任务解析[J]. 中国军转民, 2021(15): 49-50.
5	吴红芳, 任南, 马梦园. 基于FDSM模型的WBS任务耦合问题的研究[J]. 上海管理科学, 2016, 38(6): 76-79.
	Wu Hongfang, Ren Nan, Ma Mengyuan. Research on the Coupling Problem of WBS Tasks Based on FDSM Model[J]. Shanghai Management Science, 2016, 38(6): 76-79.
6	李永波. 基于解耦子任务的多目标跟踪方法研究[D]. 重庆: 重庆理工大学, 2022.
	Li Yongbo. Research of Multi-object Tracking Method Based on Subtask Decoupling[D]. Chongqing: Chongqing University of Technology, 2022.
7	邵太华, 陈洪辉, 舒振, 等. 面向无人作战指挥控制的任务智能解析技术[J]. 指挥与控制学报, 2021, 7(2): 146-152.
	Shao Taihua, Chen Honghui, Shu Zhen, et al. Mission Intelligent Parsing for Unmanned Combat Command and Control[J]. Journal of Command and Control, 2021, 7(2): 146-152.
8	胡云鹏, 彭祺擘, 武新峰, 等. 面向MBSE的航天任务风险分析方法[J]. 网信军民融合, 2022(增2): 23-29.
9	罗海龙, 赵得智, 王皓. 面向服务的跨域协同作战任务效费分析[J]. 军事运筹与评估, 2022, 37(3): 57-63.
	Luo Hailong, Zhao Dezhi, Wang Hao. Efficiency-cost Analysis of Cross-domain Coordinated Operations Based on Service-oriented Architecture[J]. Military Operations Research and Assessments, 2022, 37(3): 57-63.
10	彭鹏菲, 龚雪, 郑雅莲, 等. 基于模拟退火与强化学习机制的任务分析方法[J]. 兵器装备工程学报, 2022, 43(9): 315-322.
	Peng Pengfei, Gong Xue, Zheng Yalian, et al. Task Analysis Approach Based on Simulated Annealing and Reinforcement Learning Mechanisms[J]. Journal of Ordnance Equipment Engineering, 2022, 43(9): 315-322.
11	Ren Jing, Huang Xishi, Huang R N. Efficient Deep Reinforcement Learning for Optimal Path Planning[J]. Electronics, 2022, 11(21): 3628.
12	王积旺, 沈立炜. 面向多机器人环境中动态异构任务的细粒度动作分配与调度方法[J]. 计算机科学, 2023, 50(2): 244-253.
	Wang Jiwang, Shen Liwei. Fine-grained Action Allocation and Scheduling Method for Dynamic Heterogeneous Tasks in Multi-robot Environments[J]. Computer Science, 2023, 50(2): 244-253.
13	朱涛, 梁维泰, 黄松华, 等. 面向任务的网络信息体系建模分析方法研究[J]. 系统仿真学报, 2020, 32(4): 727-737.
	Zhu Tao, Liang Weitai, Huang Songhua, et al. Research on Modeling and Analyzing Method of Task-oriented Network Information System of Systems[J]. Journal of System Simulation, 2020, 32(4): 727-737.
14	Al Younes Y, Barczyk M. Adaptive Nonlinear Model Predictive Horizon Using Deep Reinforcement Learning for Optimal Trajectory Planning[J]. Drones, 2022, 6(11): 323.
15	李龙跃, 刘付显, 赵慧珍. 弹道导弹防御M/M/N排队系统建模与仿真[J]. 系统仿真学报, 2018, 30(4): 1260-1271.
	Li Longyue, Liu Fuxian, Zhao Huizhen. Modeling and Simulation of Missile Defense M/M/N Queueing System[J]. Journal of System Simulation, 2018, 30(4): 1260-1271.
16	李佳炜, 江晶, 刘重阳, 等. 弹道导弹目标群轨迹建模与仿真[J]. 系统仿真学报, 2020, 32(8): 1515-1523.
	Li Jiawei, Jiang Jing, Liu Chongyang, et al. Modeling and Simulation for Target Complex Trajectory of Ballistic Missile[J]. Journal of System Simulation, 2020, 32(8): 1515-1523.
17	吴帅, 周晓华, 汪莉莉, 等. 基于实际采样的导弹弹道建模与仿真[J]. 系统仿真学报, 2019, 31(4): 811-817.
	Wu Shuai, Zhou Xiaohua, Wang Lili, et al. Modeling and Simulation of Missile Trajectory Based on Practical Sampling[J]. Journal of System Simulation, 2019, 31(4): 811-817.
18	王伟, 刘付显. 基于任务关系矩阵的作战任务分解优化[J]. 军事运筹与系统工程, 2017, 31(4): 9-14.
19	董涛, 刘付显, 杜菲菲, 等. 基于矩阵的作战任务建模及重组[J]. 工程数学学报, 2013, 30(5): 633-641.
	Dong Tao, Liu Fuxian, Du Feifei, et al. Modeling and Reengineering for Anti-TBM Operational Task Based on Matrix[J]. Chinese Journal of Engineering Mathematics, 2013, 30(5): 633-641.
20	马悦, 吴琳, 许霄, 等. 智能化作战任务规划需求分析[J]. 指挥控制与仿真, 2021, 43(4): 61-67.
	Ma Yue, Wu Lin, Xu Xiao, et al. Requirement Analysis of Intelligent Operation Task Planning[J]. Command Control & Simulation, 2021, 43(4): 61-67.
21	王小康, 冀杰, 刘洋, 等. 基于改进Q学习算法的无人物流配送车路径规划[J]. 系统仿真学报, 2024, 36(5): 1211-1221.
	Wang Xiaokang, Ji Jie, Liu Yang, et al. Path Planning of Unmanned Delivery Vehicle Based on Improved Q-learning Algorithm[J]. Journal of System Simulation, 2024, 36(5): 1211-1221.
22	胡鹤轩, 钱泽宇, 胡强, 等. 离散四水库问题基准下基于n步Q-learning的水库群优化调度[J]. 中国水利水电科学研究院学报(中英文), 2023, 21(2): 138-147.
	Hu Hexuan, Qian Zeyu, Hu Qiang, et al. Optimal Scheduling of Multi-reservoir System Based on N-step Q-learning Under Discrete Four-reservoir Problem Benchmark[J]. Journal of China Institute of Water Resources and Hydropower Research, 2023, 21(2): 138-147.
23	唐斯琪, 潘志松, 胡谷雨, 等. 深度强化学习在天基信息网络中的应用-现状与前景[J]. 系统工程与电子技术, 2023, 45(3): 886-901.
	Tang Siqi, Pan Zhisong, Hu Guyu, et al. Application of Deep Reinforcement Learning in Space Information Network-status Quo and Prospects[J]. Systems Engineering and Electronics, 2023, 45(3): 886-901.
24	宋健, 王子磊. 基于值分解的多目标多智能体深度强化学习方法[J]. 计算机工程, 2023, 49(1): 31-40.
	Song Jian, Wang Zilei. Multi-goal Multi-agent Deep Reinforcement Learning Method Based on Value Decomposition[J]. Computer Engineering, 2023, 49(1): 31-40.
25	Zhou Zhiqian, Zhu Pengming, Zeng Zhiwen, et al. Robot Navigation in a Crowd by Integrating Deep Reinforcement Learning and Online Planning[J]. Applied Intelligence, 2022, 52(13): 15600-15616.
26	倪郑鸿远. 强化学习的内在奖励优化方法研究[D]. 哈尔滨: 哈尔滨工业大学, 2021.
	Ni Zhenghongyuan. Research on Intrinsic Reward Optimization Method of Reinforcement Learning[D]. Harbin: Harbin Institute of Technology, 2021.
27	于航. 基于深度强化学习的多智能体协作学习算法研究[D]. 哈尔滨: 哈尔滨工业大学, 2021.
	Yu Hang. Research on Multi-agent Cooperative Learning Based on Deep Reinforcement Learning[D]. Harbin: Harbin Institute of Technology, 2021.
28	闫超, 相晓嘉, 徐昕, 等. 多智能体深度强化学习及其可扩展性与可迁移性研究综述[J]. 控制与决策, 2022, 37(12): 3083-3102.
	Yan Chao, Xiang Xiaojia, Xu Xin, et al. A Survey on Scalability and Transferability of Multi-agent Deep Reinforcement Learning[J]. Control and Decision, 2022, 37(12): 3083-3102.
29	Mnih V, Kavukcuoglu K, Silver D, et al. Human-level Control Through Deep Reinforcement Learning[J]. Nature, 2015, 518(7540): 529-533.
30	王锦, 张新有. 基于DQN的无人驾驶任务卸载策略[J]. 计算机应用研究, 2022, 39(9): 2738-2744.
	Wang Jin, Zhang Xinyou. DQN-based Driverless Task Offloading Policy[J]. Application Research of Computers, 2022, 39(9): 2738-2744.
31	刘森, 李玺, 黄运. 基于改进DQN算法的NPC行进路线规划研究[J]. 无线电工程, 2022, 52(8): 1441-1446.
	Liu Sen, Li Xi, Huang Yun. Research on Marching Route Planning of NPC Based on Improved DQN Algorithm[J]. Radio Engineering, 2022, 52(8): 1441-1446.
32	白辰甲, 刘鹏, 赵巍, 等. 基于TD-error自适应校正的深度Q学习主动采样方法[J]. 计算机研究与发展, 2019, 56(2): 262-280.
	Bai Chenjia, Liu Peng, Zhao Wei, et al. Active Sampling for Deep Q-Learning Based on TD-error Adaptive Correction[J]. Journal of Computer Research and Development, 2019, 56(2): 262-280.
33	吴雨桐. 产品协同设计任务的排序与调度问题研究[D]. 太原: 太原科技大学, 2017.
	Wu Yutong. Study on Task Scheduling and Dispatch in Collaborative Product Development[D]. Taiyuan: Taiyuan University of Science and Technology, 2017.

参数	值	含义
γ_max	0.98	γ经验折损率
EXPLORE	30 000,60 000	epsilon衰减的总步数
BATCH	70	小批量训练样本数
memory_size	5 000	记忆上限
neuro_layer1	20	第一层隐藏层
neuro_layer2	64	第二层隐藏层
INITIAL_EPSILON	0.01	epsilon的初始值

[1]	Sun Yifeng, Li Zhi, Wu Jiang, Wang Yubin. Research on Learnable Wargame Agent Driven by Battle Scheme [J]. Journal of System Simulation, 2024, 36(7): 1525-1535.
[2]	Jiang Quan, Wei Jingxuan. Real-time Scheduling Method for Dynamic Flexible Job Shop Scheduling [J]. Journal of System Simulation, 2024, 36(7): 1609-1620.
[3]	Qin Baoxin, Zhang Yuxiao, Wu Sirui, Cao Weichong, Li Zhan. Intelligent Optimization of Coal Terminal Unloading Scheduling Based on Improved D3QN Algorithm [J]. Journal of System Simulation, 2024, 36(3): 770-781.
[4]	Zhang Guohui, Gao Ang, Zhang Ya'nan. Combat Effectiveness Evaluation Method of Homogeneous Cluster Equipment System Based on RLoMAG+EAS [J]. Journal of System Simulation, 2024, 36(1): 160-169.
[5]	An Jing, Si Guangya, Zhang Lei. Strategy Optimization Method of Multi-dimension Projection Based on Deep Reinforcement Learning [J]. Journal of System Simulation, 2024, 36(1): 39-49.
[6]	Guo Runxia, Wang Yifu. Aircraft Assignment Method for Optimal Utilization of Maintenance Intervals [J]. Journal of System Simulation, 2023, 35(9): 1985-1999.
[7]	Junqiang Lin, Hongjun Wang, Xiangjun Zou, Po Zhang, Chengen Li, Yipeng Zhou, Shujie Yao. Obstacle Avoidance Path Planning and Simulation of Mobile Picking Robot Based on DPPO [J]. Journal of System Simulation, 2023, 35(8): 1692-1704.
[8]	Jiayi Liu, Gang Wang, Qiang Fu, Xiangke Guo, Siyuan Wang. Intelligent Air Defense Task Assignment Based on Assignment Strategy Optimization Algorithm [J]. Journal of System Simulation, 2023, 35(8): 1705-1716.
[9]	Laiyi Yang, Jing Bi, Haitao Yuan. Intelligent Path Planning for Mobile Robots Based on SAC Algorithm [J]. Journal of System Simulation, 2023, 35(8): 1726-1736.
[10]	Fei Ding, Yuchen Sha, Ying Hong, Xiao Kuai, Dengyin Zhang. Joint Optimization Strategy of Computing Offloading and Edge Caching for Intelligent Connected Vehicles [J]. Journal of System Simulation, 2023, 35(6): 1203-1214.
[11]	Yuxuan Dai, Chenggang Cui. Deep Reinforcement Learning-Based Control Strategy for Boost Converter [J]. Journal of System Simulation, 2023, 35(5): 1109-1119.
[12]	Haotian Xu, Long Qin, Junjie Zeng, Yue Hu, Qi Zhang. Research Progress of Opponent Modeling Based on Deep Reinforcement Learning [J]. Journal of System Simulation, 2023, 35(4): 671-694.
[13]	Ding Shi, Xuefeng Yan, Lina Gong, Jingxuan Zhang, Donghai Guan, Mingqiang Wei. Multi-agent Cooperative Combat Simulation in Naval Battlefield with Reinforcement Learning [J]. Journal of System Simulation, 2023, 35(4): 786-796.
[14]	Zhiqiang Li, Yuanlong Li, Laixiang Yin, Xiangping Ma. Research on Unmanned Swarm Combat System Adaptive Evolution Model Simulation [J]. Journal of System Simulation, 2023, 35(4): 878-886.
[15]	Jiajie Shi, Peng Yang, Yannan Pi. Machine Learning-based Simulation Research of On-line Subway Pedestrian Flow Control [J]. Journal of System Simulation, 2023, 35(2): 386-395.