Journal of System Simulation ›› 2025, Vol. 37 ›› Issue (12): 3212-3223.doi: 10.16182/j.issn1004731x.joss.25-FZ0691
• Papers • Previous Articles
Liu Xiang, Jin Qiankun
Received:2025-07-17
Revised:2025-11-24
Online:2025-12-26
Published:2025-12-24
Contact:
Jin Qiankun
CLC Number:
Liu Xiang, Jin Qiankun. Research on PAC-Bayes-Based A2C Algorithm for Multi-objective Reinforcement Learning[J]. Journal of System Simulation, 2025, 37(12): 3212-3223.
| [1] | Li Yuxi. Deep Reinforcement Learning: An Overview[EB/OL]. (2017-01-25)[2025-04-29]. . |
| [2] | Mossalam H, Assael Y M, Roijers D M, et al. Multi-objective Deep Reinforcement Learning[EB/OL]. (2016-10-09)[2025-04-29]. . |
| [3] | Watkins C J C H, Dayan P. Q-learning[J]. Machine Learning, 1992, 8(3): 279-292. |
| [4] | Mnih V, Kavukcuoglu K, Silver D, et al. Playing Atari with Deep Reinforcement Learning[EB/OL]. (2013-12-19)[2025-04-29]. . |
| [5] | Lockwood O, Si Mei. A Review of Uncertainty for Deep Reinforcement Learning[C]//Proceedings of the Eighteenth AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment. Palo Alto, CA, USA: AAAI Press, 2022: 155-162. |
| [6] | Ghavamzadeh Mohammad, Mannor Shie, Pineau Joelle, et al. Bayesian Reinforcement Learning: A Survey[J]. Foundations and Trends® in Machine Learning, 2015, 8(5/6): 359-483. |
| [7] | Van Moffaert Kristof, Drugan Madalina M, Nowé Ann. Scalarized Multi-objective Reinforcement Learning: Novel Design Techniques[C]//2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL). Piscataway: IEEE, 2013: 191-199. |
| [8] | Ren Milong, He Zaikai, Zhang Haicang. Multi-objective Antibody Design with Constrained Preference Optimization[C]//ICLR 2025 Conference. New York: ICLR, 2025: 1-29. |
| [9] | Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous Control with Deep Reinforcement Learning[C]//ICLR 2016. New York: ICLR, 2016: 1-14. |
| [10] | Fujimoto S, Hoof H, Meger D. Addressing Function Approximation Error in Actor-critic Methods[C]//Proceedings of the 35th International Conference on Machine Learning. Chia Laguna Resort: PMLR, 2018: 1587-1596. |
| [11] | Haarnoja T, Zhou A, Abbeel P, et al. Soft Actor-critic: Off-policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor[C]//Proceedings of the 35th International Conference on Machine Learning. Chia Laguna Resort: PMLR, 2018: 1861-1870. |
| [12] | Tasdighi B, Akgül Abdullah, Haussmann M, et al. PAC-bayesian Soft Actor-critic Learning[C]//Proceedings of the 6th Symposium on Advances in Approximate Bayesian Inference. Chia Laguna Resort: PMLR, 2024: 127-145. |
| [13] | Ciosek K, Vuong Q, Loftin R, et al. Better Exploration with Optimistic Actor-critic[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2019: 1787-1798. |
| [14] | Zhang Yu, Cai Peixiang, Pan Changyong, et al. Multi-agent Deep Reinforcement Learning-based Cooperative Spectrum Sensing with Upper Confidence Bound Exploration[J]. IEEE Access, 2019, 7: 118898-118906. |
| [15] | Shi Yucheng, Lynch David, Agapitos Alexandros. UCB-driven Utility Function Search for Multi-objective Reinforcement Learning[C]//Machine Learning and Knowledge Discovery in Databases. Research Track. Cham: Springer Nature Switzerland, 2026: 163-178. |
| [16] | Agarwal M, Aggarwal V, Lan Tian. Multi-objective Reinforcement Learning with Non-linear Scalarization[C]//Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems. Richland: International Foundation for Autonomous Agents and Multiagent Systems, 2022: 9-17. |
| [17] | Van Moffaert Kristof, Drugan Madalina M, Nowé Ann. Hypervolume-based Multi-objective Reinforcement Learning[C]//Evolutionary Multi-Criterion Optimization. Berlin: Springer Berlin Heidelberg, 2013: 352-366. |
| [18] | Qiu Shuang, Zhang Dake, Yang Rui, et al. Traversing Pareto Optimal Policies: Provably Efficient Multi-objective Reinforcement Learning[EB/OL]. (2024-07-24)[2025-04-29]. . |
| [19] | Tasdighi B, Werge N, Wu Yishan, et al. Probabilistic Actor-critic: Learning to Explore with PAC-bayes Uncertainty[EB/OL]. (2024-02-05)[2025-04-29]. . |
| [20] | Basaklar T, Gumussoy S, Ogras U Y. PD-MORL: Preference-driven Multi-objective Reinforcement Learning Algorithm[C]. (2022-08-16)[2025-04-29]. . |
| [21] | Felten Florian, Alegre Lucas N, Nowé Ann, et al. A Toolkit for Reliable Benchmarking and Research in Multi-objective Reinforcement Learning[C]//Proceedings of the 37th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2023: 23671-23700. |
| [22] | Xu Jie, Tian Yunsheng, Ma Pingchuan, et al. Prediction-guided Multi-objective Reinforcement Learning for Continuous Robot Control[C]//Proceedings of the 37th International Conference on Machine Learning. Chia Laguna Resort: PMLR, 2020: 10607-10616. |
| [23] | Alegre Lucas N, L C Bazzan Ana, Roijers Diederik M, et al. Sample-efficient Multi-objective Learning via Generalized Policy Improvement Prioritization[C]//Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems. Richland: International Foundation for Autonomous Agents and Multiagent Systems, 2023: 2003-2012. |
| [24] | Lu Haoye, Herman D, Yu Yaoliang. Multi-objective Reinforcement Learning: Convexity, Stationarity and Pareto Optimality[C]//ICLR 2023. New York: ICLR, 2023: 1-27. |
| [1] | Chen Juan, Zheng Wang, Liu Qianqian, Lu Bin. Automatic Multi-objective Optimization Based on Dynamic Storage Location Allocation Strategy [J]. Journal of System Simulation, 2025, 37(6): 1435-1448. |
| [2] | Jin Xurong, Yin Jiang, Yang Guohua, Li Wei, Wang Guobin, Wang Lele, Yang Na, Zhou Xuenian. Optimal Scheduling of Virtual Power Plant with Coupled Operation of CCS-P2G Considering Wind and Photovoltaic Uncertainty [J]. Journal of System Simulation, 2025, 37(5): 1129-1141. |
| [3] | Wu Zisong, Chang Daofang, Gai Yuchun. Optimization of Cargo Location Allocation in Four-way Shuttle Warehousing System Based on Two-stage Hybrid Algorithm [J]. Journal of System Simulation, 2025, 37(5): 1234-1245. |
| [4] | Guo Bo, Tie Ming, Fan Wenhui. Vibration Simulation and Multivariate Statistical Analysis Method of Composite Structures [J]. Journal of System Simulation, 2025, 37(3): 571-583. |
| [5] | Zhong Huaping, Fan Yubo, Shui Jijun, Wang Danhao, Peng Daogang. Optimal Scheduling of Integrated Energy Systems Considering Source-load Uncertainty and Linear Carbon Trading [J]. Journal of System Simulation, 2025, 37(10): 2485-2499. |
| [6] | Ding Xinhuan, Wang Huaqing, Dang Xu. Multi-objective Optimization of Signal Timing at Intersections Considering Tailpipe Emissions [J]. Journal of System Simulation, 2025, 37(10): 2687-2700. |
| [7] | Wang Ke, Guan Sijia, Xiyan Yin, Li Xixing, Tang Hongtao. Research on Mixed-model Assembly Line Balancing Optimization Based on Hybrid Genetic Tabu Search Algorithm [J]. Journal of System Simulation, 2025, 37(1): 167-182. |
| [8] | Li Feixing, Xing Lining, Zhou Yu. Adversarial Simulation Testing Algorithm for SVM Based on Multi-objective Evolutionary Optimization [J]. Journal of System Simulation, 2024, 36(9): 2016-2031. |
| [9] | Li Erchao, Zhang Shenghui. UAV Online Track Planning Based on DMOEA-APTC Algorithm [J]. Journal of System Simulation, 2024, 36(9): 2086-2099. |
| [10] | Bao Zhe, Li Wei, Zhang Xiaofang, An Zongyuan, Xu Ye. Study on Robust Chance Constrained Optimization of Multi-energy Supply System Based on Wind and Solar Power Combined Output Simulation [J]. Journal of System Simulation, 2024, 36(8): 1895-1913. |
| [11] | Zhang Wenqiang, Wang Xiaomeng, Zhang Xiaoxiao, Zhang Guohui. Hybrid Evolutionary Multi-objective Optimization Algorithm for Vehicle Routing Problem with Simultaneous Delivery and Pickup [J]. Journal of System Simulation, 2024, 36(8): 1914-1928. |
| [12] | Xie Xin, Hu Xiaobing, Zhou Hang. Research on Path Optimization Algorithm in Dynamic Routing Environment [J]. Journal of System Simulation, 2024, 36(8): 1969-1981. |
| [13] | Jiang Quan, Wei Jingxuan. Real-time Scheduling Method for Dynamic Flexible Job Shop Scheduling [J]. Journal of System Simulation, 2024, 36(7): 1609-1620. |
| [14] | Deng Mingjun, Hu Xinxia, Li Xiang, Xu Liping. Arterial Coordination Optimization Method Based on Vehicle Speed Guidance and Inductive Control [J]. Journal of System Simulation, 2024, 36(6): 1309-1321. |
| [15] | Wen Tingxin, Guan Tingyu. Hybrid Flow Shop Scheduling with Limited Buffers Considering Energy Consumption and Transportation [J]. Journal of System Simulation, 2024, 36(6): 1344-1358. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||