Journal of System Simulation ›› 2025, Vol. 37 ›› Issue (11): 2877-2887.doi: 10.16182/j.issn1004731x.joss.24-0604
• Papers • Previous Articles
Di Jian1,2, Wan Xue1, Jiang Limei1,3
Received:2024-06-04
Revised:2024-07-24
Online:2025-11-18
Published:2025-11-27
Contact:
Jiang Limei
CLC Number:
Di Jian, Wan Xue, Jiang Limei. Evolutionary Reinforcement Learning Based on Elite Instruction and Random Search[J]. Journal of System Simulation, 2025, 37(11): 2877-2887.
Table 1
Final performance of four algorithms in five environments
| 任务 | EDC-RL | CEM-TD3 | ERL | TD3 | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Mean | Median | Std | Mean | Median | Std | Mean | Median | Std | Mean | Median | Std | ||
| Swimmer-v2 | 365.85 | 365.09 | 1.8 | 151.97 | 138.10 | 105.36 | 152.14 | 149.54 | 67.77 | 83.22 | 77.15 | 28.32 | |
| HalfCheetah-v2 | 12 122.18 | 12 130.03 | 237.75 | 10 875.90 | 10 789.85 | 706.70 | 5 753.22 | 5 201.25 | 1 031.90 | 10 500.78 | 10 452.88 | 419.51 | |
| Walker2d-v2 | 4 679.29 | 4 402.94 | 400.48 | 4 126.37 | 4 151.47 | 477.87 | 3 984.24 | 4 278.72 | 635.45 | 3 488.45 | 3 571.11 | 461.32 | |
| Hopper -v2 | 3 823.54 | 3 771.83 | 150.33 | 3 133.02 | 3 714.95 | 1 277.34 | 3 129.92 | 3 057.53 | 267.24 | 3 165.54 | 3 555.34 | 742.16 | |
| Ant -v2 | 3 532.07 | 3 294.87 | 861.32 | 3 339.61 | 4 178.80 | 1 749.38 | 2 060.80 | 2 320.38 | 797.08 | 4 729.97 | 4 790.98 | 211.03 | |
| [1] | Lample G, Chaplot D S. Playing FPS Games with Deep Reinforcement Learning[C]//Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2017: 2140-2146. |
| [2] | Nguyen H, La H. Review of Deep Reinforcement Learning for Robot Manipulation[C]//2019 Third IEEE International Conference on Robotic Computing (IRC). Piscataway: IEEE, 2019: 590-595. |
| [3] | Zhang Weiwei, Ji Ming, Yu Haoran, et al. ReLP: Reinforcement Learning Pruning Method Based on Prior Knowledge[J]. Neural Processing Letters, 2023, 55(4): 4661-4678. |
| [4] | Yang Yikun, He Jiarui, Chen Chunlin, et al. Balancing Awareness Fast Charging Control for Lithium-ion Battery Pack Using Deep Reinforcement Learning[J]. IEEE Transactions on Industrial Electronics, 2023, 71(4): 3718-3727. |
| [5] | 安靖, 司光亚, 张雷. 基于深度强化学习的立体投送策略优化方法研究[J]. 系统仿真学报, 2024, 36(1): 39-49. |
| An Jing, Si Guangya, Zhang Lei. Strategy Optimization Method of Multi-dimension Projection Based on Deep Reinforcement Learning[J]. Journal of System Simulation, 2024, 36(1): 39-49. | |
| [6] | 逄金辉, 冯子聪. 基于不确定性的深度强化学习探索方法综述[J]. 计算机应用研究, 2023, 40(11): 3201-3210. |
| Pang Jinhui, Feng Zicong. Exploration Approaches in Deep Reinforcement Learning Based on Uncertainty: A Review[J]. Application Research of Computers, 2023, 40(11): 3201-3210. | |
| [7] | Arulkumaran K, Deisenroth M P, Brundage M, et al. Deep Reinforcement Learning: A Brief Survey[J]. IEEE Signal Processing Magazine, 2017, 34(6): 26-38. |
| [8] | Salimans T, Ho J, Chen X, et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning[J]. arXiv preprint arXiv:, 2017. |
| [9] | Mania H, Guy A, Recht B. Simple Random Search of Static Linear Policies is Competitive for Reinforcement Learning[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2018: 1805-1814. |
| [10] | Slowik Adam, Kwasnicka Halina. Evolutionary Algorithms and Their Applications to Engineering Problems[J]. Neural Computing and Applications, 2020, 32(16): 12363-12379. |
| [11] | Li Jialian, Ren Tongzheng, Yan Dong, et al. Policy Learning for Robust Markov Decision Process with a Mismatched Generative Model[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2022: 7417-7425. |
| [12] | Such F P, Madhavan V, Conti E, et al. Deep Neuroevolution: Genetic algorithms are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning[J]. arXiv Preprint arXiv:, 2017. |
| [13] | Sigaud Olivier. Combining Evolution and Deep Reinforcement Learning for Policy Search: A Survey[J]. ACM Transactions on Evolutionary Learning and Optimization, 2023, 3(3): 10. |
| [14] | Qian Hong, Yu Yang. Derivative-free Reinforcement Learning: A Review[J]. Frontiers of Computer Science, 2021, 15(6): 156336. |
| [15] | Khadka S, Tumer K. Evolution-guided Policy Gradient in Reinforcement Learning[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2018: 1196-1208. |
| [16] | Drugan Madalina M. Reinforcement Learning Versus Evolutionary Computation: A Survey on Hybrid Algorithms[J]. Swarm and Evolutionary Computation, 2019, 44: 228-246. |
| [17] | 吕帅, 龚晓宇, 张正昊, 等. 结合进化算法的深度强化学习方法研究综述[J]. 计算机学报, 2022, 45(7): 1478-1499. |
| Shuai Lü, Gong Xiaoyu, Zhang Zhenghao, et al. Survey of Deep Reinforcement Learning Methods with Evolutionary Algorithms[J]. Chinese Journal of Computers, 2022, 45(7): 1478-1499. | |
| [18] | Moriarty D E, Schultz A C, Grefenstette J J. Evolutionary Algorithms for Reinforcement Learning[J]. Journal of Artificial Intelligence Research, 1999, 11(1): 241-276. |
| [19] | Whiteson S, Stone P. Evolutionary Function Approximation for Reinforcement Learning[J]. The Journal of Machine Learning Research, 2006, 7: 877-917. |
| [20] | 王君逸, 王志, 李华雄, 等. 基于自适应噪声的最大熵进化强化学习方法[J]. 自动化学报, 2023, 49(1): 54-66. |
| Wang Junyi, Wang Zhi, Li Huaxiong, et al. Adaptive Noise-based Evolutionary Reinforcement Learning with Maximum Entropy[J]. Acta Automatica Sinica, 2023, 49(1): 54-66. | |
| [21] | Bodnar C, Day B, Lió Pietro. Proximal Distilled Evolutionary Reinforcement Learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2020: 3283-3290. |
| [22] | Pourchot Aloïs, Sigaud Olivier. CEM-RL: Combining Evolutionary and Gradient-based Methods for Policy Search[C]//ICLR 2019. New York: ICLR, 2019: 1-18. |
| [23] | 王尧, 罗俊仁, 周棪忠, 等. 面向策略探索的强化学习与进化计算方法综述[J]. 计算机科学, 2024, 51(3): 183-197. |
| Wang Yao, Luo Junren, Zhou Yanzhong, et al. Review of Reinforcement Learning and Evolutionary Computation Methods for Strategy Exploration[J]. Computer Science, 2024, 51(3): 183-197. | |
| [24] | Wang Yuxing, Zhang Tiantian, Chang Yongzhe, et al. A Surrogate-assisted Controller for Expensive Evolutionary Reinforcement Learning[J]. Information Sciences, 2022, 616: 539-557. |
| [25] | Shuai Lü, Han Shuai, Zhou Wenbo, et al. Recruitment-imitation Mechanism for Evolutionary Reinforcement Learning[J]. Information Sciences, 2021, 553: 172-188. |
| [26] | Chen Maiyue, He Guangyi. Efficient and Stable Off-policy Training via Behavior-aware Evolutionary Learning[C]//Proceedings of the 6th Conference on Robot Learning. Chia Laguna Resort: PMLR, 2023: 482-491. |
| [27] | Ma Yan, Liu Tianxing, Wei Bingsheng, et al. Evolutionary Action Selection for Gradient-based Policy Learning[C]//Neural Information Processing. Cham: Springer International Publishing, 2023: 579-590. |
| [28] | Dong Caibo, Li Dazi. Adaptive Evolutionary Reinforcement Learning with Policy Direction[J]. Neural Processing Letters, 2024, 56(2): 69. |
| [29] | Fujimoto S, van Hoof Herke, Meger D. Addressing Function Approximation Error in Actor-critic Methods[C]//Proceedings of the 35th International Conference on Machine Learning. Chia Laguna Resort: PMLR, 2018: 1587-1596. |
| [30] | Fujimoto S, Gu Shixiang. A Minimalist Approach to Offline Reinforcement Learning[C]//Proceedings of the 35th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2021: 20132-20145. |
| [31] | 彭坤彦, 尹翔, 刘笑竹, 等. 基于粒子群优化和深度强化学习的策略搜索方法[J]. 计算机工程与科学, 2023, 45(4): 718-725. |
| Peng Kunyan, Yin Xiang, Liu Xiaozhu, et al. A Strategy Search Method Based on Particle Swarm Optimization and Deep Reinforcement Learning[J]. Computer Engineering & Science, 2023, 45(4): 718-725. | |
| [32] | Suri K, Shi X Q, Plataniotis K N, et al. Maximum Mutation Reinforcement Learning for Scalable Control[J]. arXiv Preprint arXiv:, 2020. |
| [33] | Marchesini Enrico, Corsi Davide, Farinelli Alessandro. Genetic Soft Updates for Policy Evolution in Deep Reinforcement Learning[C]//ICLR 2021. New York: ICLR, 2021: 1-15. |
| [1] | Jiang Ming, He Tao. Solving the Vehicle Routing Problem Based on Deep Reinforcement Learning [J]. Journal of System Simulation, 2025, 37(9): 2177-2187. |
| [2] | Ni Peilong, Mao Pengjun, Wang Ning, Yang Mengjie. Robot Path Planning Based on Improved A-DDQN Algorithm [J]. Journal of System Simulation, 2025, 37(9): 2420-2430. |
| [3] | Chen Zhen, Wu Zhuoyi, Zhang Lin. Research on Policy Representation in Deep Reinforcement Learning [J]. Journal of System Simulation, 2025, 37(7): 1753-1769. |
| [4] | Ji Zhicheng, Quan Zhen, Wang Yan. Optimization and Simulation of Adaptive Production Scheduling Based on Hybrid Decision-making Mechanism [J]. Journal of System Simulation, 2025, 37(7): 1791-1803. |
| [5] | Wu Guohua, Zeng Jiaheng, Wang Dezhi, Zheng Long, Zou Wei. A Quadrotor Trajectory Tracking Control Method Based on Deep Reinforcement Learning [J]. Journal of System Simulation, 2025, 37(5): 1169-1187. |
| [6] | Zheng Jiayu, Mai Zhuxue, Chen Zheyi. Optimization of Service Caching and Computation Offloading in Digital Twin Cloud-edge Networks [J]. Journal of System Simulation, 2025, 37(11): 2741-2753. |
| [7] | Xu Zhongkai, Chu Chenyang, Xie Kai, Zhao Ruizhuo, Ke Wenjun. Optimization Dispatch Method for High-proportion Renewable Energy Power Systems Based on SC-PPO [J]. Journal of System Simulation, 2025, 37(10): 2511-2521. |
| [8] | Liang Xiuman, Liu Ziliang, Liu Zhendong. Path Planning of Improved RRT Algorithm Based on Deep Reinforcement Learning [J]. Journal of System Simulation, 2025, 37(10): 2578-2593. |
| [9] | Jiang Jiachen, Jia Zhengxuan, Xu Zhao, Lin Tingyu, Zhao Pengpeng, Ou Yiming. Decision Modeling and Solution Based on Game Adversarial Complex Systems [J]. Journal of System Simulation, 2025, 37(1): 66-78. |
| [10] | Zhang Wenqiang, Wang Xiaomeng, Zhang Xiaoxiao, Zhang Guohui. Hybrid Evolutionary Multi-objective Optimization Algorithm for Vehicle Routing Problem with Simultaneous Delivery and Pickup [J]. Journal of System Simulation, 2024, 36(8): 1914-1928. |
| [11] | Sun Xin, Xing Lining, Wang Rui, Wang Ling, Shi Jianmai, Luo Tianyu. Air Defense Missile Weapon Target Assignment Based on Multi-objective Evolutionary Algorithm [J]. Journal of System Simulation, 2024, 36(6): 1298-1308. |
| [12] | Zhang Hu, Zhang Heng, Huang Zilu, Wang Zhe, Fu Qingpo, Peng Jin, Wang Feng. Mixed-variable Particle Swarm Optimization Algorithm Based on Competitive Coevolution [J]. Journal of System Simulation, 2024, 36(4): 844-858. |
| [13] | Qin Baoxin, Zhang Yuxiao, Wu Sirui, Cao Weichong, Li Zhan. Intelligent Optimization of Coal Terminal Unloading Scheduling Based on Improved D3QN Algorithm [J]. Journal of System Simulation, 2024, 36(3): 770-781. |
| [14] | Li Ming, Ye Wangzhong, Yan Jiehua. Path Planning of Desert Robot Based on Deep Reinforcement Learning [J]. Journal of System Simulation, 2024, 36(12): 2917-2925. |
| [15] | Zhang Yongfu, Liu Yang, Yuan He. A Method for Key Node Identification in Operational Target System Based on War Gaming [J]. Journal of System Simulation, 2024, 36(11): 2654-2661. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||