Journal of System Simulation ›› 2025, Vol. 37 ›› Issue (12): 3212-3223.doi: 10.16182/j.issn1004731x.joss.25-FZ0691

• Papers • Previous Articles    

Research on PAC-Bayes-Based A2C Algorithm for Multi-objective Reinforcement Learning

Liu Xiang, Jin Qiankun   

  1. School of Computer Science & Technology, Beijing Institute of Technology, Beijing 100081, China
  • Received:2025-07-17 Revised:2025-11-24 Online:2025-12-26 Published:2025-12-24
  • Contact: Jin Qiankun

Abstract:

To address the theoretical challenges of exploration and exploitation trade-offs and uncertainty modeling in multi-objective reinforcement learning (MORL), this study developed a learning framework, MO-PAC, based on PAC-Bayes theory. By introducing a multi-objective stochastic Critic network and a dynamic preference mechanism, the framework extended the conventional A2C architecture, enabling adaptive and efficient approximation of complex Pareto fronts. Experimental results demonstrate that in multi-objective MuJoCo environments, MO-PAC outperforms baseline algorithms, achieving approximately 20% improvement in hypervolume and 60% increase in expected utility, while exhibiting superior convergence efficiency and robustness. It verifies both theoretical value and practical performance advantages in multi-objective decision-making. The MO-PAC framework overcomes the theoretical limitations of existing methods in dynamic trade-off and uncertainty modeling, providing a novel methodological foundation for advancing the MORL theoretical framework.

Key words: multi-objective optimization, multi-objective reinforcement learning, actor-critic algorithm, uncertainty, PAC-Bayes theory

CLC Number: