| [1] |
Li Yuxi. Deep Reinforcement Learning: An Overview[EB/OL]. (2017-01-25)[2025-04-29]. .
|
| [2] |
Mossalam H, Assael Y M, Roijers D M, et al. Multi-objective Deep Reinforcement Learning[EB/OL]. (2016-10-09)[2025-04-29]. .
|
| [3] |
Watkins C J C H, Dayan P. Q-learning[J]. Machine Learning, 1992, 8(3): 279-292.
|
| [4] |
Mnih V, Kavukcuoglu K, Silver D, et al. Playing Atari with Deep Reinforcement Learning[EB/OL]. (2013-12-19)[2025-04-29]. .
|
| [5] |
Lockwood O, Si Mei. A Review of Uncertainty for Deep Reinforcement Learning[C]//Proceedings of the Eighteenth AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment. Palo Alto, CA, USA: AAAI Press, 2022: 155-162.
|
| [6] |
Ghavamzadeh Mohammad, Mannor Shie, Pineau Joelle, et al. Bayesian Reinforcement Learning: A Survey[J]. Foundations and Trends® in Machine Learning, 2015, 8(5/6): 359-483.
|
| [7] |
Van Moffaert Kristof, Drugan Madalina M, Nowé Ann. Scalarized Multi-objective Reinforcement Learning: Novel Design Techniques[C]//2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL). Piscataway: IEEE, 2013: 191-199.
|
| [8] |
Ren Milong, He Zaikai, Zhang Haicang. Multi-objective Antibody Design with Constrained Preference Optimization[C]//ICLR 2025 Conference. New York: ICLR, 2025: 1-29.
|
| [9] |
Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous Control with Deep Reinforcement Learning[C]//ICLR 2016. New York: ICLR, 2016: 1-14.
|
| [10] |
Fujimoto S, Hoof H, Meger D. Addressing Function Approximation Error in Actor-critic Methods[C]//Proceedings of the 35th International Conference on Machine Learning. Chia Laguna Resort: PMLR, 2018: 1587-1596.
|
| [11] |
Haarnoja T, Zhou A, Abbeel P, et al. Soft Actor-critic: Off-policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor[C]//Proceedings of the 35th International Conference on Machine Learning. Chia Laguna Resort: PMLR, 2018: 1861-1870.
|
| [12] |
Tasdighi B, Akgül Abdullah, Haussmann M, et al. PAC-bayesian Soft Actor-critic Learning[C]//Proceedings of the 6th Symposium on Advances in Approximate Bayesian Inference. Chia Laguna Resort: PMLR, 2024: 127-145.
|
| [13] |
Ciosek K, Vuong Q, Loftin R, et al. Better Exploration with Optimistic Actor-critic[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2019: 1787-1798.
|
| [14] |
Zhang Yu, Cai Peixiang, Pan Changyong, et al. Multi-agent Deep Reinforcement Learning-based Cooperative Spectrum Sensing with Upper Confidence Bound Exploration[J]. IEEE Access, 2019, 7: 118898-118906.
|
| [15] |
Shi Yucheng, Lynch David, Agapitos Alexandros. UCB-driven Utility Function Search for Multi-objective Reinforcement Learning[C]//Machine Learning and Knowledge Discovery in Databases. Research Track. Cham: Springer Nature Switzerland, 2026: 163-178.
|
| [16] |
Agarwal M, Aggarwal V, Lan Tian. Multi-objective Reinforcement Learning with Non-linear Scalarization[C]//Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems. Richland: International Foundation for Autonomous Agents and Multiagent Systems, 2022: 9-17.
|
| [17] |
Van Moffaert Kristof, Drugan Madalina M, Nowé Ann. Hypervolume-based Multi-objective Reinforcement Learning[C]//Evolutionary Multi-Criterion Optimization. Berlin: Springer Berlin Heidelberg, 2013: 352-366.
|
| [18] |
Qiu Shuang, Zhang Dake, Yang Rui, et al. Traversing Pareto Optimal Policies: Provably Efficient Multi-objective Reinforcement Learning[EB/OL]. (2024-07-24)[2025-04-29]. .
|
| [19] |
Tasdighi B, Werge N, Wu Yishan, et al. Probabilistic Actor-critic: Learning to Explore with PAC-bayes Uncertainty[EB/OL]. (2024-02-05)[2025-04-29]. .
|
| [20] |
Basaklar T, Gumussoy S, Ogras U Y. PD-MORL: Preference-driven Multi-objective Reinforcement Learning Algorithm[C]. (2022-08-16)[2025-04-29]. .
|
| [21] |
Felten Florian, Alegre Lucas N, Nowé Ann, et al. A Toolkit for Reliable Benchmarking and Research in Multi-objective Reinforcement Learning[C]//Proceedings of the 37th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2023: 23671-23700.
|
| [22] |
Xu Jie, Tian Yunsheng, Ma Pingchuan, et al. Prediction-guided Multi-objective Reinforcement Learning for Continuous Robot Control[C]//Proceedings of the 37th International Conference on Machine Learning. Chia Laguna Resort: PMLR, 2020: 10607-10616.
|
| [23] |
Alegre Lucas N, L C Bazzan Ana, Roijers Diederik M, et al. Sample-efficient Multi-objective Learning via Generalized Policy Improvement Prioritization[C]//Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems. Richland: International Foundation for Autonomous Agents and Multiagent Systems, 2023: 2003-2012.
|
| [24] |
Lu Haoye, Herman D, Yu Yaoliang. Multi-objective Reinforcement Learning: Convexity, Stationarity and Pareto Optimality[C]//ICLR 2023. New York: ICLR, 2023: 1-27.
|