系统仿真学报 ›› 2025, Vol. 37 ›› Issue (11): 2877-2887.doi: 10.16182/j.issn1004731x.joss.24-0604

• 论文 • 上一篇    

基于精英指导和随机搜索的进化强化学习

邸剑1,2, 万雪1, 姜丽梅1,3   

  1. 1.华北电力大学(保定)计算机系,河北 保定 071003
    2.河北省能源电力知识计算重点实验室,河北 保定 071003
    3.复杂能源系统智能计算教育部工程研究中心,河北 保定 071003
  • 收稿日期:2024-06-04 修回日期:2024-07-24 出版日期:2025-11-18 发布日期:2025-11-27
  • 通讯作者: 姜丽梅
  • 第一作者简介:邸剑(1968-),男,高工,硕士,研究方向为人工智能及应用、物联网技术与应用。
  • 基金资助:
    华北电力大学中央高校基本科研业务费专项资金(2022MS102)

Evolutionary Reinforcement Learning Based on Elite Instruction and Random Search

Di Jian1,2, Wan Xue1, Jiang Limei1,3   

  1. 1.Department of Computer, North China Electric Power University (Baoding), Baoding 071003, China
    2.Hebei Key Lab of Knowledge Computing for Energy & Power, Baoding 071003, China
    3.Engineering Research Center of Intelligent Computing for Complex Energy Systems, Ministry of Education, Baoding 071003, China
  • Received:2024-06-04 Revised:2024-07-24 Online:2025-11-18 Published:2025-11-27
  • Contact: Jiang Limei

摘要:

针对进化强化学习因样本效率低、耦合方式单一及收敛性差而导致的性能与扩展性受限问题,提出一种基于精英梯度指导和双重随机搜索的改进算法。通过在强化策略训练时引入携带进化信息的精英策略梯度指导,纠正了强化策略梯度更新的方向采用双重随机搜索替换原有的进化组件,降低算法复杂性的同时使得策略搜索在参数空间进行有意义和可控的搜索引入完全替换信息交易有效地平衡了强化策略和进化策略的学习和探索。实验结果表明:该方法相比于经典的进化强化学习方法在探索力、鲁棒性和收敛性方面具有一定的提升。

关键词: 进化强化学习, 深度强化学习, 进化算法, 连续控制, 精英梯度指导

Abstract:

Evolutionary reinforcement learning currently suffers from low sample efficiency, a single coupling method, and poor convergence, which can affect its performance and scaling. To address this issue, an improved algorithm based on elite gradient instruction and double random search was proposed. The direction of the reinforcement strategy gradient update was corrected by introducing elite strategy gradient guidance carrying evolutionary information during reinforcement strategy training. Double stochastic search was used to replace the original evolutionary component, reducing the complexity of the algorithm while making the policy search meaningful and controllable in the parameter space. The introduction of complete replacement information trading effectively balanced the learning and search of reinforcement and evolutionary strategies. Experimental results show that the method has improved exploration power, robustness, and convergence compared to the classical evolutionary reinforcement learning method.

Key words: evolutionary reinforcement learning, deep reinforcement learning, evolutionary algorithm, continuous control, elite gradient instruction

中图分类号: