系统仿真学报 ›› 2025, Vol. 37 ›› Issue (12): 3212-3223.doi: 10.16182/j.issn1004731x.joss.25-FZ0691

• 论文 • 上一篇    

基于PAC-Bayes的多目标强化学习A2C算法研究

刘翔, 金乾坤   

  1. 北京理工大学 计算机学院,北京 100081
  • 收稿日期:2025-07-17 修回日期:2025-11-24 出版日期:2025-12-26 发布日期:2025-12-24
  • 通讯作者: 金乾坤
  • 第一作者简介:刘翔(1999-),男,硕士,研究方向为软件智能与软件工程。

Research on PAC-Bayes-Based A2C Algorithm for Multi-objective Reinforcement Learning

Liu Xiang, Jin Qiankun   

  1. School of Computer Science & Technology, Beijing Institute of Technology, Beijing 100081, China
  • Received:2025-07-17 Revised:2025-11-24 Online:2025-12-26 Published:2025-12-24
  • Contact: Jin Qiankun

摘要:

针对多目标强化学习(multi-objective reinforcement learning,MORL)在探索-利用权衡与不确定性建模方面的理论挑战,构建了一种基于PAC-Bayes理论的学习框架MO-PAC通过引入多目标随机Critic网络与动态偏好机制,对传统A2C架构进行了关键性扩展,实现了对复杂Pareto前沿的自适应与高效逼近。实验结果表明:在MuJoCo多目标环境中,MO-PAC相比基线算法在超体积指标上平均提升约20%,期望效用指标平均提升约60%,并展现出更优的收敛效率与鲁棒性,验证了其在多目标决策中的理论价值与实际性能优势。MO-PAC框架克服了现有方法在动态权衡及不确定性建模的理论局限,为MORL理论体系提供新的方法论支撑。

关键词: 多目标优化, 多目标强化学习, 行动者-批评家算法, 不确定性, PAC-Bayes理论

Abstract:

To address the theoretical challenges of exploration and exploitation trade-offs and uncertainty modeling in multi-objective reinforcement learning (MORL), this study developed a learning framework, MO-PAC, based on PAC-Bayes theory. By introducing a multi-objective stochastic Critic network and a dynamic preference mechanism, the framework extended the conventional A2C architecture, enabling adaptive and efficient approximation of complex Pareto fronts. Experimental results demonstrate that in multi-objective MuJoCo environments, MO-PAC outperforms baseline algorithms, achieving approximately 20% improvement in hypervolume and 60% increase in expected utility, while exhibiting superior convergence efficiency and robustness. It verifies both theoretical value and practical performance advantages in multi-objective decision-making. The MO-PAC framework overcomes the theoretical limitations of existing methods in dynamic trade-off and uncertainty modeling, providing a novel methodological foundation for advancing the MORL theoretical framework.

Key words: multi-objective optimization, multi-objective reinforcement learning, actor-critic algorithm, uncertainty, PAC-Bayes theory

中图分类号: