基于SC-PPO的高比例新能源电力系统优化调度方法

doi:10.16182/j.issn1004731x.joss.25-0529

摘要/Abstract

摘要：

高比例新能源的接入使电力系统呈现显著的随机性、多目标耦合和安全约束难题，为解决传统模型驱动方法在建模精度与适应性方面的局限，提出一种安全约束嵌入的近端策略优化算法(safety-constrained PPO，SC-PPO)。利用时序卷积网络构建动态状态编码器，融合历史运行、实时监测与预测数据，形成具因果性的状态表征；设计分层奖励结构，并基于约束满足度引入自适应权重机制，以协调多目标优化；在策略输出层嵌入物理约束投影算子，将机组爬坡速率、储能荷电状态及电压幅值等约束转化为动作空间的可行域映射。仿真结果表明：SC-PPO在降低电压越限次数75%的同时，提高风电消纳率至95.6%，并将碳排放量降至15 220 t，为高比例新能源电力系统提供了兼具自适应性与安全性的智能决策新范式。

关键词: 深度强化学习, 电力系统调度, 动态状态表征, 分层奖励机制, 安全约束嵌入

Abstract:

The high proportion of renewable energy integration brings significant challenges of randomness, multi-objective coupling, and security constraints to power systems. Traditional model-driven methods have limitations in modeling accuracy and adaptability. To address these issues, this paper proposed a safety-constrained PPO algorithm (SC-PPO). The method included three improvements. A temporal convolutional network was utilized to construct a dynamic state encoder that integrated historical operation, real-time monitoring, and prediction data to form a causal state representation. A hierarchical reward structure was designed, and an adaptive weighting mechanism based on constraint satisfaction degree was introduced to coordinate multi-objective optimization. Physical constraint projection operators were embedded in the policy output layer, transforming constraints such as unit ramp rates, energy storage state of charge, and voltage magnitudes into feasible region mappings in the action space. Simulation results show that SC-PPO reduces voltage limit violations by 75% while improving wind power accommodation rate to 95.6% and reducing carbon emissions to 15 220 t, the research provides a new paradigm of intelligent decision-making that combines adaptability and security for high renewable energy penetration power systems.

Key words: deep reinforcement learning, power system dispatch, dynamic state representation, hierarchical reward mechanism, security constraint embedding

中图分类号:

TP391

徐忠锴,储晨阳,解凯等 . 基于SC-PPO的高比例新能源电力系统优化调度方法[J]. 系统仿真学报, 2025, 37(10): 2511-2521.

Xu Zhongkai,Chu Chenyang,Xie Kai,et al . Optimization Dispatch Method for High-proportion Renewable Energy Power Systems Based on SC-PPO[J]. Journal of System Simulation, 2025, 37(10): 2511-2521.

图/表 12

图1

图2

图3

图4

图5

表1

图6

图7

图8

表2

表3

时间窗口长度h对调度性能影响

$h$	风电消纳率/%	碳排放量/t	运行成本/万元	电压越限次数/(次/72 h)
6	93.2	15 890	412.6	4
9	94.8	15 560	409.1	3
12	95.6	15 220	406.3	2
15	95.1	15 430	408.7	3
18	94.1	15 680	411.4	4

表3

表4

参考文献 21

[1]	International Renewable Energy Agency. Renewable Capacity Statistics 2025[EB/OL]. [2025-04-15]. .
[2]	林超凡, 别朝红. 新型电力系统不确定性静态建模及量化分析方法评述[J]. 电力系统自动化, 2024, 48(19): 14-27.
	Lin Chaofan, Bie Zhaohong. Review of Static Modeling and Quantitative Analysis Methods for Uncertainties of New Power Systems[J]. Automation of Electric Power Systems, 2024, 48(19): 14-27.
[3]	江昌旭, 郭辰, 刘晨曦, 等. 基于深度强化学习的主动配电网动态重构综述[J]. 高电压技术, 2025, 51(4): 1801-1816.
	Jiang Changxu, Guo Chen, Liu Chenxi, et al. Review of Active Distribution Network Dynamic Reconfiguration Based on Deep Reinforcement Learning[J]. High Voltage Engineering, 2025, 51(4): 1801-1816.
[4]	Saglam Baturay, Furkan Burak Mutlu, Dogan Can Cicek, et al. Parameter-free Reduction of the Estimation Bias in Deep Reinforcement Learning for Deterministic Policy Gradients[J]. Neural Processing Letters, 2024, 56(2): 80.
[5]	彭刘阳, 孙元章, 徐箭, 等. 基于深度强化学习的自适应不确定性经济调度[J]. 电力系统自动化, 2020, 44(9): 33-42.
	Peng Liuyang, Sun Yuanzhang, Xu Jian, et al. Self-adaptive Uncertainty Economic Dispatch Based on Deep Reinforcement Learning[J]. Automation of Electric Power Systems, 2020, 44(9): 33-42.
[6]	赵鹏杰, 吴俊勇, 王燚, 等. 基于深度强化学习的微电网优化运行策略[J]. 电力自动化设备, 2022, 42(11): 9-16.
	Zhao Pengjie, Wu Junyong, Wang Yi, et al. Optimal Operation Strategy of Microgrid Based on Deep Reinforcement Learning[J]. Electric Power Automation Equipment, 2022, 42(11): 9-16.
[7]	Fujimoto Scott, van Hoof Herke, Meger David. Addressing Function Approximation Error in Actor-critic Methods[EB/OL]. (2018-10-22) [2025-04-18]. .
[8]	陈实, 朱亚斌, 刘艺洪, 等. 基于世界模型深度强化学习的含风电电力系统低碳经济调度[J]. 电网技术, 2024, 48(8): 3143-3154.
	Chen Shi, Zhu Yabin, Liu Yihong, et al. Low-carbon Economic Dispatch of Wind-containing Power Systems Based on World Model Deep Reinforcement Learning[J]. Power System Technology, 2024, 48(8): 3143-3154.
[9]	李志军, 徐博, 张家安, 等. 基于TD3可变长度时间窗口最优加权的短期负荷预测策略[J]. 电力建设, 2024, 45(6): 140-148.
	Li Zhijun, Xu Bo, Zhang Jiaan, et al. Short-term Load Optimal Weighted Forecasting Strategy Based on TD3 Variable Length Time Window[J]. Electric Power Construction, 2024, 45(6): 140-148.
[10]	张磊光, 陈海涛, 杨军. 基于SAC算法的含柔性负荷电-气互联系统的频率与气压协调控制策略[J]. 智慧电力, 2024, 52(4): 8-14.
	Zhang Leiguang, Chen Haitao, Yang Jun. Frequency-pressure Coordinated Control Strategy of Electrical-gas Interconnection System Based on SAC Algorithm[J]. Smart Power, 2024, 52(4): 8-14.
[11]	李鑫伟, 陈彬剑, 于明志, 等. 基于多目标优化的多能互补冷热电联产系统运行优化研究[J]. 热力发电, 2024, 53(7): 73-81.
	Li Xinwei, Chen Binjian, Yu Mingzhi, et al. Research on Operation Optimization of Multi-energy Complementary Cogeneration System Based on Multi-objective Optimization[J]. Thermal Power Generation, 2024, 53(7): 73-81.
[12]	曾朝晖, 赵会勇, 罗恩韬, 等. 基于自适应混合优化的电力数据预测方法[J]. 控制与决策, 2023, 38(12): 3490-3498.
	Zeng Zhaohui, Zhao Huiyong, Luo Entao, et al. Power Data Forecasting Method Based on Adaptive Hybrid Optimization[J]. Control and Decision, 2023, 38(12): 3490-3498.
[13]	李练兵, 高国强, 吴伟强, 等. 考虑特征重组与改进Transformer的风电功率短期日前预测方法[J]. 电网技术, 2024, 48(4): 1466-1476.
	Li Lianbing, Gao Guoqiang, Wu Weiqiang, et al. Short-term Day-ahead Wind Power Prediction Considering Feature Recombination and Improved Transformer[J]. Power System Technology, 2024, 48(4): 1466-1476.
[14]	王兴国, 程琪, 于溯. 利用电压变化特征的识别电力系统振荡的方法[J]. 高电压技术, 2024, 50(10): 4655-4661.
	Wang Xingguo, Cheng Qi, Yu Su. Method for Identifying Power System Swing by Using Voltage Variation Characteristics[J]. High Voltage Engineering, 2024, 50(10): 4655-4661.
[15]	蔺伟山, 王小君, 孙庆凯, 等. 不确定性环境下基于深度强化学习的综合能源系统动态调度[J]. 电力系统保护与控制, 2022, 50(18): 50-60.
	Lin Weishan, Wang Xiaojun, Sun Qingkai, et al. Dynamic Dispatch of an Integrated Energy System Based on Deep Reinforcement Learning in an Uncertain Environment[J]. Power System Protection and Control, 2022, 50(18): 50-60.
[16]	杨珺, 吴飞业. 基于双鱼群算法的电力系统无功优化[J]. 控制与决策, 2018, 33(10): 1886-1892.
	Yang Jun, Wu Feiye. Reactive Power Optimization of Power System Based on Double Fish-swarm Algorithm[J]. Control and Decision, 2018, 33(10): 1886-1892.
[17]	吕凯, 毛荀, 乔咏田, 等. 电网调度中基于深度学习的实时优化方法研究[J]. 电工技术, 2024(增2): 402-405.
	Kai Lü, Mao Xun, Qiao Yongtian, et al. Research on Real-time Optimization Method Based on Deep Learning in Power Grid Scheduling[J]. Electric Engineering, 2024(S2): 402-405.
[18]	谭洪, 陈嘉迅, 王秋杰, 等. 计及风电频率支撑能力和运行风险的鲁棒机组组合模型[J]. 电力系统保护与控制, 2025, 53(4): 96-107.
	Tan Hong, Chen Jiaxun, Wang Qiujie, et al. A Robust Unit Commitment Model Considering Wind Power Frequency Support Capability and Operational Risk[J]. Power System Protection and Control, 2025, 53(4): 96-107.
[19]	高琴, 徐光虎, 夏尚学, 等. 基于深度强化学习的电力系统紧急切机稳控策略生成方法[J]. 电力科学与技术学报, 2025, 40(1): 39-46.
	Gao Qin, Xu Guanghu, Xia Shangxue, et al. Policy Generation Method for Power System Stability Control During Emergent Tripping of Unit Based on Deep Reinforcement Learning[J]. Journal of Electric Power Science and Technology, 2025, 40(1): 39-46.
[20]	Schulman J, Wolski F, Dhariwal P, et al. Proximal Policy Optimization Algorithms[EB/OL]. (2017-08-28) [2025-03-19]. .
[21]	Bai Shaojie, Kolter J Z, Koltun V. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling[EB/OL]. (2018-04-19) [2025-03-19]. .

实验设置	风电消纳率/%	碳排放量/t	运行成本/万元	电压越限次数/(次/72 h)
SC-PPO	95.6	15 220	406.3	2
w/o TCN	91.7	16 090	418.6	6
w/o HR	92.5	15 830	415.4	4
w/o CP	93.1	15 620	412.9	9

SoC边界	电压边界	风电消纳率/%	电压越限次数/(次/72 h)	运行成本/万元
[15%, 85%]	[0.93, 1.07]	96.2	6	404.1
[20%, 80%]	[0.95, 1.05]	95.6	2	406.3
[25%, 75%]	[0.96, 1.04]	94.8	1	409.7
[30%, 70%]	[0.97, 1.03]	93.4	0	413.2