系统仿真学报 ›› 2025, Vol. 37 ›› Issue (10): 2511-2521.doi: 10.16182/j.issn1004731x.joss.25-0529

• 新型电力系统和综合能源系统仿真技术 • 上一篇    

基于SC-PPO的高比例新能源电力系统优化调度方法

徐忠锴1, 储晨阳1, 解凯1, 赵睿卓2, 柯文俊3   

  1. 1.南京南瑞继保电气有限公司,江苏 南京 211102
    2.北京计算机技术及应用研究所,北京 100854
    3.东南大学 计算机科学与工程学院,江苏 南京 211189
  • 收稿日期:2025-06-09 修回日期:2025-09-11 出版日期:2025-10-20 发布日期:2025-10-21
  • 第一作者简介:徐忠锴(1996-),男,工程师,硕士,研究方向为人工智能等。
  • 基金资助:
    南瑞集团科技项目(GW2400040);东南大学新进教师科研启动经费(RF1028623234)

Optimization Dispatch Method for High-proportion Renewable Energy Power Systems Based on SC-PPO

Xu Zhongkai1, Chu Chenyang1, Xie Kai1, Zhao Ruizhuo2, Ke Wenjun3   

  1. 1.NR Electric Co. , Ltd. , Nanjing 211102, China
    2.Beijing Institute of Computer Technology and Application, Beijing 100854, China
    3.School of Computer Science and Engineering, Southeast University, Nanjing 211189, China
  • Received:2025-06-09 Revised:2025-09-11 Online:2025-10-20 Published:2025-10-21

摘要:

高比例新能源的接入使电力系统呈现显著的随机性、多目标耦合和安全约束难题,为解决传统模型驱动方法在建模精度与适应性方面的局限,提出一种安全约束嵌入的近端策略优化算法(safety-constrained PPO,SC-PPO)。利用时序卷积网络构建动态状态编码器,融合历史运行、实时监测与预测数据,形成具因果性的状态表征;设计分层奖励结构,并基于约束满足度引入自适应权重机制,以协调多目标优化;在策略输出层嵌入物理约束投影算子,将机组爬坡速率、储能荷电状态及电压幅值等约束转化为动作空间的可行域映射。仿真结果表明:SC-PPO在降低电压越限次数75%的同时,提高风电消纳率至95.6%,并将碳排放量降至15 220 t,为高比例新能源电力系统提供了兼具自适应性与安全性的智能决策新范式。

关键词: 深度强化学习, 电力系统调度, 动态状态表征, 分层奖励机制, 安全约束嵌入

Abstract:

The high proportion of renewable energy integration brings significant challenges of randomness, multi-objective coupling, and security constraints to power systems. Traditional model-driven methods have limitations in modeling accuracy and adaptability. To address these issues, this paper proposed a safety-constrained PPO algorithm (SC-PPO). The method included three improvements. A temporal convolutional network was utilized to construct a dynamic state encoder that integrated historical operation, real-time monitoring, and prediction data to form a causal state representation. A hierarchical reward structure was designed, and an adaptive weighting mechanism based on constraint satisfaction degree was introduced to coordinate multi-objective optimization. Physical constraint projection operators were embedded in the policy output layer, transforming constraints such as unit ramp rates, energy storage state of charge, and voltage magnitudes into feasible region mappings in the action space. Simulation results show that SC-PPO reduces voltage limit violations by 75% while improving wind power accommodation rate to 95.6% and reducing carbon emissions to 15 220 t, the research provides a new paradigm of intelligent decision-making that combines adaptability and security for high renewable energy penetration power systems.

Key words: deep reinforcement learning, power system dispatch, dynamic state representation, hierarchical reward mechanism, security constraint embedding

中图分类号: