Journal of System Simulation ›› 2025, Vol. 37 ›› Issue (7): 1753-1769.doi: 10.16182/j.issn1004731x.joss.25-0533

• Invited Reviews • Previous Articles    

Research on Policy Representation in Deep Reinforcement Learning

Chen Zhen2,3, Wu Zhuoyi2,3, Zhang Lin1,2,3   

  1. 1.Hangzhou International Innovation Institute, Beihang University, Beijing 311115, China
    2.School of Automation Science and Electrical Engineering, Beihang University, Beijing 100191, China
    3.State Key Laboratory of Intelligent Manufacturing Systems Technology, Beijing 100854, China
  • Received:2025-06-09 Revised:2025-06-23 Online:2025-07-18 Published:2025-07-30
  • Contact: Zhang Lin

Abstract:

Deep reinforcement learning (DRL) has achieved remarkable success in various domains. Nevertheless, existing policy networks in DRL still face significant challenges in areas such as generalizability, multi-task adaptability, and sample efficiency. Policy representation, as a crucial research direction for enhancing DRL capabilities, aims to improve an agent's adaptability to environmental changes and novel tasks by constructing more efficient and generalizable forms of policy expression. This paper provided a concise overview of key research advances in the field of policy representation. It introduced diverse policy architectures, ranging from traditional multi-layer perceptron (MLP)-based policies to those based on pointer networks, sequence generation models, diffusion models, hypernetworks, modular designs, mixture of experts models, and cross-modal policies based on serialized tokens. The paper sorted out cutting-edge research concerning policy representation methods, specifically addressing how semantic information within policy inputs and intermediate representations is encoded and optimized. It concluded with a summary and discussed prospects for future development.

Key words: policy representation, deep reinforcement learning, generalizability, multi-task learn

CLC Number: