Journal of System Simulation ›› 2025, Vol. 37 ›› Issue (7): 1753-1769.doi: 10.16182/j.issn1004731x.joss.25-0533
• Invited Reviews • Previous Articles
Chen Zhen2,3, Wu Zhuoyi2,3, Zhang Lin1,2,3
Received:
2025-06-09
Revised:
2025-06-23
Online:
2025-07-18
Published:
2025-07-30
Contact:
Zhang Lin
CLC Number:
Chen Zhen, Wu Zhuoyi, Zhang Lin. Research on Policy Representation in Deep Reinforcement Learning[J]. Journal of System Simulation, 2025, 37(7): 1753-1769.
Table1
Comparison of Different Policy Representation Architectures
策略架构类型 | 输入结构 | 动作输出形式 | 典型适用场景 |
---|---|---|---|
基于MLP的策略架构 | 状态向量(可拼接任务标签) | Softmax概率/高斯分布函数 | 简单任务、连续/离散控制问题 |
基于指针网络的策略架构 | 状态特征+动态元素序列(如城市、任务点) | 指向输入元素的注意力索引 | 排序、路径规划、组合优化问题(TSP/ CVRP) |
基于序列建模的策略架构 | 回报+状态+动作组成的轨迹序列(可附加Prompt) | 自回归生成下一个动作 | 离线RL、条件策略建模、多任务控制 |
基于扩散过程的策略架构 | 状态特征+随机噪声向量+(可选)目标/奖励引导 | 反扩散采样生成的高维动作 | 多模态策略生成、Offline RL、复杂控制 |
超网络驱动的策略架构 | 状态嵌入+任务/上下文向量 | 由超网络生成的策略网络参数 | 多任务迁移、协作控制、零样本泛化 |
基于模块化结构的策略架构 | 局部状态+图结构连接信息(节点/边) | 各模块局部策略输出的联合动作 | 多智能体系统、结构可变机器人 |
基于混合专家的策略架构 | 状态向量+上下文向量(可为任务或环境提示) | 多专家动作的门控融合输出 | 多模态任务、非平稳策略集成、长期控制 |
基于序列化Token的策略架构 | 多模态信息统一为Token序列 | 序列模型自回归生成动作 | 跨模态任务、多任务泛化、统一策略学习与部署 |
[1] | 刘朝阳, 穆朝絮, 孙长银. 深度强化学习算法与应用研究现状综述[J]. 智能科学与技术学报, 2020, 2(4): 312-326. |
Liu Zhaoyang, Mu Zhaoxu, Sun Changyin. An Overview on Algorithms and Applications of Deep Reinforcement Learning[J]. Chinese Journal of Intelligent Science and Technology, 2020, 2(4): 312-326. | |
[2] | 李静, 丁佳文, 沈南燕, 等. 基于深度强化学习的双足机器人行走策略研究[J]. 机器人技术与应用, 2025(3): 44-49. |
[3] | Li Minne, Wu Lisheng, Wang Jun, et al. Multi-view Reinforcement Learning[C]//Advances in Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2019: 1-12. |
[4] | Silver D, Hubert T, Schrittwieser J, et al. A General Reinforcement Learning Algorithm That Masters Chess, Shogi, and Go Through Self-play[J]. Science, 2018, 362(6419): 1140-1144. |
[5] | Vinyals O, Babuschkin I, Czarnecki W M, et al. Grandmaster Level in StarCraft II Using Multi-agent Reinforcement Learning[J]. Nature, 2019, 575(7782): 350-354. |
[6] | Stiennon N, Ouyang Long, Wu J, et al. Learning to Summarize from Human Feedback[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 3008-3021. |
[7] | Cobbe K, Klimov O, Hesse C, et al. Quantifying Generalization in Reinforcement Learning[C]//Proceedings of the 36th International Conference on Machine Learning. Chia Laguna Resort: PMLR, 2019: 1282-1289 |
[8] | Mazoure B, Doan T, Li Tianyu, et al. Low-rank Representation of Reinforcement Learning Policies[J]. Journal of Artificial Intelligence Research, 2022, 75: 597-636. |
[9] | Nabati Ofir, Tennenholtz Guy, Mannor Shie. Representation-driven Reinforcement Learning[C]//Proceedings of the 40th International Conference on Machine Learning. Chia Laguna Resort: PMLR, 2023: 25588-25603. |
[10] | Yang Long, Huang Zhixiong, Lei Fenghao, et al. Policy Representation via Diffusion Probability Model for Reinforcement Learning[EB/OL]. (2023-05-22) [2025-06-01]. . |
[11] | Levine S, Pastor P, Krizhevsky A, et al. Learning Hand-eye Coordination for Robotic Grasping with Deep Learning and Large-scale Data Collection[J]. The International Journal of Robotics Research, 2018, 37(4/5): 421-436. |
[12] | Jared Di Carlo, Wensing P M, Katz B, et al. Dynamic Locomotion in the MIT Cheetah 3 Through Convex Model-predictive Control[C]//2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Piscataway: IEEE, 2018: 1-9. |
[13] | Mock J W, Muknahallipatna S S. Sim-to-real: A Performance Comparison of PPO, TD3, and SAC Reinforcement Learning Algorithms for Quadruped Walking Gait Generation[J]. Journal of Intelligent Learning Systems and Applications, 2024, 16(2): 23-43. |
[14] | Kaufmann T, Weng P, Bengs V, et al. A Survey of Reinforcement Learning from Human Feedback[EB/OL]. (2024-04-30) [2025-06-01]. . |
[15] | Welsby P, Cheung B M Y. ChatGPT[J]. Postgraduate Medical Journal, 2023, 99(1176): 1047-1048. |
[16] | Bai Yuntao, Kadavath S, Kundu S, et al. Constitutional AI: Harmlessness from AI Feedback[EB/OL]. (2022-12-15) [2025-06-01]. . |
[17] | Hessel M, Soyer H, Espeholt L, et al. Multi-task Deep Reinforcement Learning with PopArt[C]//Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence. Palo Alto: AAAI Press, 2019: 3796-3803. |
[18] | Fakoor R, Chaudhari P, Soatto S, et al. Meta-Q-Learning[EB/OL]. (2020-04-04) [2025-06-01]. . |
[19] | Liu Jinxin, Wang Donglin, Tian Qiangxing, et al. Learn Goal-conditioned Policy with Intrinsic Motivation for Deep Reinforcement Learning[C]//Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence and Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence and the Twelveth Symposium on Educational Advances in Artificial Intelligence. Palo Alto, CA, USA: AAAI Press, 2022: 7558-7566. |
[20] | Mnih V, Kavukcuoglu K, Silver D, et al. Human-level Control Through Deep Reinforcement Learning[J]. Nature, 2015, 518(7540): 529-533. |
[21] | Lillicrap P T, Hunt J J, Pritzel A, et al. Continuous Control with Deep Reinforcement Learning[EB/OL]. (2019-07-05) [2025-06-01]. . |
[22] | Schulman J, Wolski F, Dhariwal P, et al. Proximal Policy Optimization Algorithms[EB/OL]. (2017-08-28) [2025-06-01]. . |
[23] | Haarnoja T, Zhou A, Abbeel P, et al. Soft Actor-critic: Off-policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor[C]//Proceedings of the 35th International Conference on Machine Learning. Chia Laguna Resort: PMLR, 2018: 1861-1870. |
[24] | Vinyals O, Fortunato M, Jaitly N. Pointer Networks[C]//Advances in Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2015: 1-9. |
[25] | Bello I, Pham H, Le Q V, et al. Neural Combinatorial Optimization with Reinforcement Learning[EB/OL]. (2017-01-12) [2025-06-01]. . |
[26] | Kool W, van Hoof Herke, Welling M. Attention, Learn to Solve Routing Problems![EB/OL]. (2019-02-07) [2025-06-01]. . |
[27] | Nazari M, Oroojlooy A, Snyder L, et al. Reinforcement Learning for Solving the Vehicle Routing Problem[C]//Advances in Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2018: 1-11. |
[28] | Sudhakar R V, Dastagiraiah C, Pattem S, et al. Multi-objective Reinforcement Learning Based Algorithm for Dynamic Workflow Scheduling in Cloud Computing[J]. Indonesian Journal of Electrical Engineering and Informatics, 2024, 12(3): 640-649. |
[29] | Li Wei, Li Ruxuan, Ma Yuzhe, et al. Rethinking Graph Neural Networks for the Graph Coloring Problem[EB/OL]. (2022-08-19) [2025-06-01]. . |
[30] | Chen Lili, Lu K, Rajeswaran A, et al. Decision Transformer: Reinforcement Learning via Sequence Modeling[C]//Advances in Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2021: 15084-15097. |
[31] | Janner M, Li Qiyang, Levine S. Offline Reinforcement Learning as One Big Sequence Modeling Problem[C]//Advances in Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2021: 1273-1286. |
[32] | Xu Mengdi, Shen Yikang, Zhang Shun, et al. Prompting Decision Transformer for Few-shot Policy Generalization[C]//Proceedings of the 39th International Conference on Machine Learning. Chia Laguna Resort: PMLR, 2022: 24631-24645. |
[33] | Wang Zhendong, Hunt J J, Zhou Mingyuan. Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning[EB/OL]. (2023-08-25) [2025-06-01]. . |
[34] | Chen Huayu, Lu Cheng, Ying Chengyang, et al. Offline Reinforcement Learning via High-fidelity Generative Behavior Modeling[EB/OL]. (2023-02-28) [2025-06-01]. . |
[35] | Lu Cheng, Chen Huayu, Chen Jianfei, et al. Contrastive Energy Prediction for Exact Energy-guided Diffusion Sampling in Offline Reinforcement Learning[C]//Proceedings of the 40th International Conference on Machine Learning. Chia Laguna Resort: PMLR, 2023: 22825-22855. |
[36] | Kang Bingyi, Ma Xiao, Du Chao, et al. Efficient Diffusion Policies for Offline Reinforcement Learning[C]//Proceedings of the 37th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2023: 67195-67212. |
[37] | Coleman M, Russakovsky O, Allen-Blanchette C, et al. Discrete Diffusion Reward Guidance Methods for Offline Reinforcement Learning[C]//ICML 2023 Workshop: Sampling and Optimization in Discrete Space. San Diego: ICML, 2023: 1-9. |
[38] | Qiao Ruixi, Cheng Jie, Dai Xingyuan, et al. Offline Reinforcement Learning with Discrete Diffusion Skills[EB/OL]. (2025-03-26) [2025-06-01]. . |
[39] | Ha D, Dai A, Le Q V. Hypernetworks[EB/OL]. (2016-12-01) [2025-06-01]. . |
[40] | Johannes von Oswald, Henning C, Grewe B F, et al. Continual Learning with Hypernetworks[EB/OL]. (2022-04-11) [2025-06-01]. . |
[41] | Zhao D, Kobayashi S, Sacramento João, et al. Meta-learning Via Hypernetworks[C]//4th Workshop on Meta-Learning at NeurIPS 2020 (MetaLearn 2020). Piscataway: IEEE, 2020: 1-8. |
[42] | Rashid T, Samvelyan M, Christian Schroeder De Witt, et al. Monotonic Value Function Factorisation for Deep Multi-agent Reinforcement Learning[J]. The Journal of Machine Learning Research, 2020, 21(1): 7234-7284. |
[43] | Iqbal S, Christian A Schroeder De Witt, Peng Bei, et al. Randomized Entity-wise Factorization for Multi-agent Reinforcement Learning[C]//Proceedings of the 38th International Conference on Machine Learning. Chia Laguna Resort: PMLR, 2021: 4596-4606. |
[44] | Hegde S, Huang Zhehui, Sukhatme G S. HyperPPO: A Scalable Method for Finding Small Policies for Robotic Control[C]//2024 IEEE International Conference on Robotics and Automation (ICRA). Piscataway: IEEE, 2024: 10821-10828. |
[45] | Huang Yizhou, Xie K, Bharadhwaj H, et al. Continual Model-Based Reinforcement Learning with Hypernetworks[C]//2021 IEEE International Conference on Robotics and Automation (ICRA). Piscataway: IEEE, 2021: 799-805. |
[46] | Faccio Francesco, Herrmann Vincent, Ramesh Aditya, et al. Goal-conditioned Generators of Deep Policies[C]//Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence. Palo Alto: AAAI Press, 2023: 7503-7511. |
[47] | Rezaei-Shoshtari Sahand, Morissette Charlotte, Hogan Francois R, et al. Hypernetworks for Zero-shot Transfer in Reinforcement Learning[C]//Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence. Palo Alto: AAAI Press, 2023: 9579-9587. |
[48] | Graffeuille O, Koh Y S, Wicker Jörg, et al. Multi-task Learning with Hypernetworks and Task Metadata[C]//ICLR 2024 Conference. New York: ICLR, 2024: 1-18. |
[49] | Chen Tao, Murali A, Gupta A. Hardware Conditioned Policies for Multi-robot Transfer Learning[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2018: 9355-9366. |
[50] | Schaff C, Yunis D, Chakrabarti A, et al. Jointly Learning to Construct and Control Agents Using Deep Reinforcement Learning[C]//2019 International Conference on Robotics and Automation (ICRA). Piscataway: IEEE, 2019: 9798-9805. |
[51] | Pathak D, Lu C, Darrell T, et al. Learning to Control Self-assembling Morphologies: A Study of Generalization Via Modularity[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2019: 2295-2305. |
[52] | Wang Tingwu, Liao Renjie, Ba J, et al. NerveNet: Learning Structured Policy with Graph Neural Networks[C]//ICLR 2018 Conference. New York: ICLR, 2018: 1-26. |
[53] | Huang Wenlong, Mordatch I, Pathak D. One Policy to Control Them All: Shared Modular Policies for Agent-agnostic Control[C]//Proceedings of the 37th International Conference on Machine Learning. Chia Laguna Resort: PMLR, 2020: 4455-4464. |
[54] | Whitman J, Travers M, Choset H. Learning Modular Robot Control Policies[J]. IEEE Transactions on Robotics, 2023, 39(5): 4095-4113. |
[55] | Nurbek G. Exploring Graph Neural Networks in Reinforcement Learning: A Comparative Study on Architectures for Locomotion Tasks[D]. Edinburg: The University of Texas Rio Grande Valley, 2024. |
[56] | Ren Jie, Li Yewen, Ding Zihan, et al. Probabilistic Mixture-of-experts for Efficient Deep Reinforcement Learning[EB/OL]. (2021-04-19) [2025-06-01]. . |
[57] | Doya Kenji, Samejima Kazuyuki, Katagiri Ken-ichi, et al. Multiple Model-based Reinforcement Learning[J]. Neural Computation, 2002, 14(6): 1347-1369. |
[58] | Samejima Kazuyuki, Doya Kenji, Kawato Mitsuo. Inter-module Credit Assignment in Modular Reinforcement Learning[J]. Neural Networks, 2003, 16(7): 985-994. |
[59] | van Seijen Harm, Bakker Bram, Kester Leon. Switching Between Different State Representations in Reinforcement Learning[C]//Proceedings of the 26th IASTED International Conference on Artificial Intelligence and Applications. USA: ACTA Press, 2008: 226-231. |
[60] | Peng Xuecin, Berseth G, Michiel van de Panne. Terrain-adaptive Locomotion Skills Using Deep Reinforcement Learning[J]. ACM Transactions on Graphics, 2016, 35(4): 81. |
[61] | Tommasino Paolo, Caligiore Daniele, Mirolli Marco, et al. A Reinforcement Learning Architecture That Transfers Knowledge Between Skills When Solving Multiple Tasks[J]. IEEE Transactions on Cognitive and Developmental Systems, 2019, 11(2): 292-317. |
[62] | Gimelfarb M, Sanner S, Lee C G. Contextual Policy Transfer in Reinforcement Learning Domains via Deep Mixtures-of-experts[C]//Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence. Chia Laguna Resort: PMLR, 2021: 1787-1797. |
[63] | Willi T, Obando-Ceron J, Foerster J, et al. Mixture of Experts in a Mixture of RL Settings[EB/OL]. (2024-06-26) [2025-06-01]. . |
[64] | Vincze Mátyás, Ferrarotti Laura, Leonardo Lucio Custode, et al. SMoSE: Sparse Mixture of Shallow Experts for Interpretable Reinforcement Learning in Continuous Control Tasks[C]//Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence and Thirty-Seventh Conference on Innovative Applications of Artificial Intelligence and Fifteenth Symposium on Educational Advances in Artificial Intelligence. Palo Alto, CA, USA: AAAI Press, 2025: 20982-20990. |
[65] | Celik O, Taranovic A, Neumann G. Acquiring Diverse Skills Using Curriculum Reinforcement Learning with Mixture of Experts[EB/OL]. (2024-06-10) [2025-06-01]. . |
[66] | Peng Xuebin, Chang M, Zhang G, et al. MCP: Learning Composable Hierarchical Control with Multiplicative Compositional Policies[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2019: 3686-3697. |
[67] | Obando-Ceron J, Sokar G, Willi T, et al. Mixtures of Experts Unlock Parameter Scaling for Deep RL[EB/OL]. (2024-06-26) [2025-06-01]. . |
[68] | Song Wenxuan, Zhao Han, Ding Pengxiang, et al. GeRM: A Generalist Robotic Model with Mixture-of-experts for Quadruped Robot[C]//2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Piscataway: IEEE, 2024: 11879-11886. |
[69] | Reed S, Zolna K, Parisotto E, et al. A Generalist Agent[EB/OL]. (2022-11-11) [2025-06-01]. . |
[70] | Driess D, Xia Fei, Sajjadi Mehdi S M, et al. PaLM-E: An Embodied Multimodal Language Model[EB/OL]. (2023-03-06) [2025-06-01]. . |
[71] | Brohan A, Brown N, Carbajal J, et al. RT-2: Vision-language-action Models Transfer Web Knowledge to Robotic Control[EB/OL]. (2023-07-28) [2025-06-01]. . |
[72] | Mazzaglia P, Verbelen T, Dhoedt B, et al. GenRL: Multimodal-foundation World Models for Generalization in Embodied Agents[EB/OL]. (2024-10-30) [2025-06-01]. . |
[73] | Liu Yuhang, Li Pengxiang, Wei Zishu, et al. InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection[EB/OL]. (2025-01-08) [2025-06-01]. . |
[74] | Bousmalis K, Vezzani G, Rao D, et al. RoboCat: A Self-improving Generalist Agent for Robotic Manipulation[EB/OL]. (2023-12-22) [2025-06-01]. . |
[75] | Jiang Yunfan, Gupta A, Zhang Zichen, et al. VIMA: General Robot Manipulation with Multimodal Prompts[EB/OL]. (2023-05-28) [2025-06-01]. . |
[76] | Jones J, Mees O, Sferrazza C, et al. Beyond Sight: Finetuning Generalist Robot Policies with Heterogeneous Sensors via Language Grounding[EB/OL]. (2025-01-14) [2025-06-01]. . |
[77] | Sridhar S, Dutta S, Jayaraman D, et al. REGENT: A Retrieval-augmented Generalist Agent That Can Act In-context in New Environments[EB/OL]. (2025-02-24) [2025-06-01]. . |
[78] | Duan Yan, Schulman J, Chen Xi, et al. RL2: Fast Reinforcement Learning via Slow Reinforcement Learning[EB/OL]. (2016-11-10) [2025-06-01]. . |
[79] | Finn C, Abbeel P, Levine S. Model-agnostic Meta-learning for Fast Adaptation of Deep Networks[C]//Proceedings of the 34th International Conference on Machine Learning. Chia Laguna Resort: PMLR, 2017: 1126-1135. |
[80] | Rakelly K, Zhou A, Finn C, et al. Efficient Off-policy Meta-reinforcement Learning via Probabilistic Context Variables[C]//Proceedings of the 36th International Conference on Machine Learning. Chia Laguna Resort: PMLR, 2019: 5331-5340. |
[81] | Lee K, Seo Y, Lee S, et al. Context-aware Dynamics Model for Generalization in Model-based Reinforcement Learning[C]//Proceedings of the 37th International Conference on Machine Learning. Chia Laguna Resort: PMLR, 2020: 5757-5766. |
[82] | Sodhani S, Zhang A, Pineau J. Multi-task Reinforcement Learning with Context-based Representations[C]//Proceedings of the 38th International Conference on Machine Learning. Chia Laguna Resort: PMLR, 2021: 9767-9779. |
[83] | Wang J, Zhang J, Jiang H,et al. Offline Meta Reinforcement Learning with In-distribution Online Adaptation[EB/OL]. (2023-06-01). [2025-06-01]. |
[84] | Beck J, Vuorio R, Liu Zheran, et al. A Survey of Meta-reinforcement Learning[EB/OL]. (2023-01-19) [2025-06-01]. . |
[85] | Hallak A, Dotan Di Castro, Mannor S. Contextual Markov Decision Processes[EB/OL]. (2015-02-08) [2025-06-01]. . |
[86] | Choi J, Guo Y, Moczulski M, et al. Contingency-Aware Exploration in Reinforcement Learning[EB/OL]. (2019-05-04) [2025-06-01]. . |
[87] | Lagos J, Lempiö Urho, Rahtu E. Evaluating Generalization in Contextual Reinforcement Learning[EB/OL]. (2023-04-03) [2025-06-01]. . |
[88] | Lanz D, Seiler Jürgen, Jaskolka K, et al. Compression of Dynamic Medical CT Data Using Motion Compensated Wavelet Lifting with Denoised Update[EB/OL]. (2023-02-02) [2025-06-01]. . |
[89] | Krishna K M. Continuous Deutsch Uncertainty Principle and Continuous Kraus Conjecture[EB/OL]. (2023-10-02) [2025-06-01]. . |
[90] | Laskin M, Srinivas A, Abbeel P. CURL: Contrastive Unsupervised Representations for Reinforcement Learning[C]//Proceedings of the 37th International Conference on Machine Learning. Chia Laguna Resort: PMLR, 2020: 5639-5650. |
[91] | Schwarzer M, Anand A, Goel R, et al. Data-efficient Reinforcement Learning with Self-predictive Representations[EB/OL]. (2021-05-20) [2025-06-01]. . |
[92] | Fu Haotian, Tang Hongyao, Hao Jianye, et al. Towards Effective Context for Meta-reinforcement Learning: An Approach Based on Contrastive Learning[C]//Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence and the Thirty-Third Conference on Innovative Applications of Artificial Intelligence and the Eleventh Symposium on Educational Advances in Artificial Intelligence. Palo Alto, CA, USA: AAAI Press, 2021: 7457-7465. |
[93] | McInroe T, Schäfer Lukas, Albrecht S V. Learning Temporally-consistent Representations for Data-efficient Reinforcement Learning[EB/OL]. (2021-10-11) [2025-06-01]. . |
[94] | Wang B, Xu S, Keutzer K, et al. Improving Context-based Meta-reinforcement Learning with Self-supervised Trajectory Contrastive Learning[EB/OL]. (2021-03-10) [2025-06-01]. . |
[95] | Eysenbach B, Zhang Tianjun, Levine S, et al. Contrastive Learning as Goal-conditioned Reinforcement Learning[C]//Advances in Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2022: 35603-35620. |
[96] | Schug S, Kobayashi S, Akram Y, et al. Discovering Modular Solutions that Generalize Compositionally[EB/OL]. (2024-03-25) [2025-06-01]. . |
[97] | Goeckner A, Sui Yueyuan, Martinet N, et al. Graph Neural Network-based Multi-agent Reinforcement Learning for Resilient Distributed Coordination of Multi-robot Systems[C]//2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). [S.l. : IEEE, 2024: 5732-5739. |
[98] | Zambaldi V, Raposo D, Santoro A, et al. Relational Deep Reinforcement learning[EB/OL]. (2018-06-28) [2025-06-01]. . |
[99] | Shiarlis K, Wulfmeier M, Salter S, et al. TACO: Learning Task Decomposition via Temporal Alignment for Control[C]//Proceedings of the 35th International Conference on Machine Learning. Chia Laguna Resort: PMLR, 2018: 4654-4663. |
[100] | Shu Tianmin, Xiong Caiming, Socher R. Hierarchical and Interpretable Skill Acquisition in Multi-task Reinforcement Learning[EB/OL]. (2017-12-20) [2025-06-01]. . |
[101] | Lee Y, Yang Jingyun, Lim J J. Learning to Coordinate Manipulation Skills via Skill Behavior Diversification[EB/OL]. (2019-12-20). [2025-06-01]. . |
[102] | Yuan Haoqi, Zhang Chi, Wang Hongcheng, et al. Skill Reinforcement Learning and Planning for Open-world Long-horizon Tasks[EB/OL]. (2023-12-04) [2025-06-01]. . |
[103] | Wen Yongyan, Li Siyuan, Zuo Rongchang, et al. SkillTree: Explainable Skill-based Deep Reinforcement Learning for Long-horizon Control Tasks[C]//Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence and Thirty-Seventh Conference on Innovative Applications of Artificial Intelligence and Fifteenth Symposium on Educational Advances in Artificial Intelligence. Palo Alto: AAAI Press, 2025: 21491-21500. |
[104] | Schaul T, Horgan D, Gregor K, et al. Universal Value Function Approximators[C]//Proceedings of the 32nd International Conference on Machine Learning. Chia Laguna Resort: PMLR, 2015: 1312-1320. |
[105] | Narasimhan K, Barzilay R, Jaakkola T. Grounding Language for Transfer in Deep Reinforcement Learning[J]. Journal of Artificial Intelligence Research, 2018, 63(1): 849-874. |
[106] | Qian Zhifeng, You Mingyu, Zhou Hongjun, et al. Weakly Supervised Disentangled Representation for Goal-conditioned Reinforcement Learning[J]. IEEE Robotics and Automation Letters, 2022, 7(2): 2202-2209. |
[107] | Peng Xuebin, Guo Yunrong, Halper L, et al. ASE: Large-scale Reusable Adversarial Skill Embeddings for Physically Simulated Characters[J]. ACM Transactions on Graphics, 2022, 41(4): 94. |
[108] | Jackermeier M, Abate A. DeepLTL: Learning to Efficiently Satisfy Complex LTL Specifications for Multi-task RL[EB/OL]. (2025-03-29) [2025-06-01]. . |
[109] | Yalcinkaya B, Lauffer N, Vazquez-Chanlatte M, et al. Compositional Automata Embeddings for Goal-conditioned Reinforcement Learning[C]//Advances in Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2024: 72933-72963. |
[110] | Paliwal Y, Roy Rajarshi, Gaglione Jean-Raphaël, et al. Reinforcement Learning with Temporal-logic-based Causal Diagrams[C]//International Cross-Domain Conference for Machine Learning and Knowledge Extraction. Cham: Springer Nature Switzerland, 2023: 123-140. |
[1] | Wu Guohua, Zeng Jiaheng, Wang Dezhi, Zheng Long, Zou Wei. A Quadrotor Trajectory Tracking Control Method Based on Deep Reinforcement Learning [J]. Journal of System Simulation, 2025, 37(5): 1169-1187. |
[2] | Jiang Jiachen, Jia Zhengxuan, Xu Zhao, Lin Tingyu, Zhao Pengpeng, Ou Yiming. Decision Modeling and Solution Based on Game Adversarial Complex Systems [J]. Journal of System Simulation, 2025, 37(1): 66-78. |
[3] | Qin Baoxin, Zhang Yuxiao, Wu Sirui, Cao Weichong, Li Zhan. Intelligent Optimization of Coal Terminal Unloading Scheduling Based on Improved D3QN Algorithm [J]. Journal of System Simulation, 2024, 36(3): 770-781. |
[4] | Li Ming, Ye Wangzhong, Yan Jiehua. Path Planning of Desert Robot Based on Deep Reinforcement Learning [J]. Journal of System Simulation, 2024, 36(12): 2917-2925. |
[5] | Zhang Yongfu, Liu Yang, Yuan He. A Method for Key Node Identification in Operational Target System Based on War Gaming [J]. Journal of System Simulation, 2024, 36(11): 2654-2661. |
[6] | An Jing, Si Guangya, Zhang Lei. Strategy Optimization Method of Multi-dimension Projection Based on Deep Reinforcement Learning [J]. Journal of System Simulation, 2024, 36(1): 39-49. |
[7] | Laiyi Yang, Jing Bi, Haitao Yuan. Intelligent Path Planning for Mobile Robots Based on SAC Algorithm [J]. Journal of System Simulation, 2023, 35(8): 1726-1736. |
[8] | Jiayi Liu, Gang Wang, Qiang Fu, Xiangke Guo, Siyuan Wang. Intelligent Air Defense Task Assignment Based on Assignment Strategy Optimization Algorithm [J]. Journal of System Simulation, 2023, 35(8): 1705-1716. |
[9] | Junqiang Lin, Hongjun Wang, Xiangjun Zou, Po Zhang, Chengen Li, Yipeng Zhou, Shujie Yao. Obstacle Avoidance Path Planning and Simulation of Mobile Picking Robot Based on DPPO [J]. Journal of System Simulation, 2023, 35(8): 1692-1704. |
[10] | Fei Ding, Yuchen Sha, Ying Hong, Xiao Kuai, Dengyin Zhang. Joint Optimization Strategy of Computing Offloading and Edge Caching for Intelligent Connected Vehicles [J]. Journal of System Simulation, 2023, 35(6): 1203-1214. |
[11] | Yuxuan Dai, Chenggang Cui. Deep Reinforcement Learning-Based Control Strategy for Boost Converter [J]. Journal of System Simulation, 2023, 35(5): 1109-1119. |
[12] | Haotian Xu, Long Qin, Junjie Zeng, Yue Hu, Qi Zhang. Research Progress of Opponent Modeling Based on Deep Reinforcement Learning [J]. Journal of System Simulation, 2023, 35(4): 671-694. |
[13] | Jiajie Shi, Peng Yang, Yannan Pi. Machine Learning-based Simulation Research of On-line Subway Pedestrian Flow Control [J]. Journal of System Simulation, 2023, 35(2): 386-395. |
[14] | Ju Xiang, Su Shengchao, Xu Chaojie, He Beibei. Task Scheduling for Internet of Vehicles Based on Deep Reinforcement Learning in Edge Computing [J]. Journal of System Simulation, 2023, 35(12): 2550-2559. |
[15] | Lingjia Ni, Xiaoxia Huang, Hongga Li, Zibo Zhang. Research on Fire Emergency Evacuation Simulation Based on Cooperative Deep Reinforcement Learning [J]. Journal of System Simulation, 2022, 34(6): 1353-1366. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||