系统仿真学报 ›› 2026, Vol. 38 ›› Issue (5): 1408-1425.doi: 10.16182/j.issn1004731x.joss.25-0456

• • 上一篇    

基于PER-MADDPG算法的发电商容量市场交易策略

李彦斌, 潘肇伦, 马新月, 宋明浩, 胡喻杰, 薛晓达   

  1. 华北电力大学 经济与管理学院,北京 102206
  • 收稿日期:2025-05-21 修回日期:2025-08-15 出版日期:2026-05-21 发布日期:2026-05-29
  • 通讯作者: 潘肇伦
  • 第一作者简介:李彦斌(1964-),男,教授,博士,研究方向为能源经济管理,电力市场运营。
  • 基金资助:
    国家自然科学基金青年项目(72404087);河北省自然科学基金青年项目(G2024502011)

Capacity Market Trading Strategies of Generators Based on PER-MADDPG Algorithm

Li Yanbin, Pan Zhaolun, Ma Xinyue, Song Minghao, Hu Yujie, Xue Xiaoda   

  1. School of Economics and Management, North China Electric Power University, Beijing 102206, China
  • Received:2025-05-21 Revised:2025-08-15 Online:2026-05-21 Published:2026-05-29
  • Contact: Pan Zhaolun

摘要:

针对不同容量市场环境下发电商如何权衡报量报价策略以实现收益最大化的问题,构建了容量市场竞价均衡模型,并针对传统求解方法存在的依赖完全信息假设和历史交易策略信息利用率低的问题,提出了一种基于优先经验回放下多智能体深度确定性策略梯度(prioritized experience replay multi-agent deep deterministic policy gradient, PER-MADDPG)的容量市场交易仿真方法。以报量报价策略构造动作空间,以历史交易策略和中标信息构造状态空间,各发电商基于有限的状态信息,利用优先经验回放机制,依据样本的时序差分误差分配采样概率,使误差较大的样本在训练过程中被更频繁地回放,有效解决了多智能体非平稳交互所导致的梯度噪声放大问题,提升样本利用效率与模型收敛速度。市场仿真结果表明,相比于MADDPG、MAPPO、MASAC、MATD3和QMIX算法,所提方法获得的发电商平均奖励分别提高了2 853.08、3 628.74、2 167.11、4 260.19和5 459.64元,平均算法耗时则分别缩短了15.35%、8.18%、3.87%、5.33%和31.03%。所提方法既可以帮助发电商在不同市场环境下制定最优容量市场交易策略,增加容量收益,也可以为我国容量市场建设者选择容量市场出清价格机制提供参考,降低电网容量采购成本。

关键词: 容量市场, 发电商, 竞价均衡模型, PER-MADDPG算法, 交易策略

Abstract:

Considering the issue of how power generators trade off their quantity and price bidding strategies to maximize profits in different capacity market environments, a capacity market bidding equilibrium model is constructed. Recognizing the limitations of traditional solution methods, which rely on the assumption of complete information and have low utilization of historical trading strategy information, a capacity market trading simulation method based on prioritized experience replay multi-agent deep deterministic policy gradient (PER-MADDPG) is proposed. The action space is constructed using quantity bidding strategy and price bidding strategy, and the state space is constructed using historical transaction strategies and winning bid information. Based on limited state information, each generator utilizes prioritized experience replay mechanism to allocate sampling probabilities according to the temporal difference error of the sample, ensuring that samples with larger errors are replayed more frequently during training. This effectively addresses the issue of amplified gradient noise caused by non-stationary interactions among multiple agents, thereby improving sample utilization efficiency and model convergence speed. Market simulation results indicate that the proposed method can help generators formulate optimal capacity market trading strategies under different market conditions to increase capacity revenue, and also can provide reference for capacity market builders in China to select capacity market clearing price mechanisms, thereby reducing grid capacity procurement costs. Compared to MADDPG, MAPPO, MASAC, MATD3, and QMIX algorithms, the average rewards obtained by the proposed method for power generators increased by 2 853.08, 3 628.74, 2 167.11, 4 260.19, and 5 459.64 yuan, while the average algorithm runtime was reduced by 15.35%, 8.18%, 3.87%, 5.33%, and 31.03%, respectively.

Key words: capacity market, generator, bidding equilibrium model, PER-MADDPG algorithm, trading strategy

中图分类号: