系统仿真学报 ›› 2026, Vol. 38 ›› Issue (2): 476-487.doi: 10.16182/j.issn1004731x.joss.25-0728

• 物理应用场景 • 上一篇    

基于多智能体强化学习的综合能源分布式优化

陶彩霞, 陈乃焜, 高锋阳, 张建刚   

  1. 兰州交通大学 自动化与电气工程学院,甘肃 兰州 730070
  • 收稿日期:2025-07-27 修回日期:2025-10-16 出版日期:2026-02-18 发布日期:2026-02-11
  • 通讯作者: 陈乃焜
  • 第一作者简介:陶彩霞(1972-),女,教授,硕士,研究方向为电机及其控制、能源系统管理。
  • 基金资助:
    国家自然科学基金(62463016);甘肃省科技计划(23JRRA880);甘肃省重点研发计划(23YFFA0059);甘肃省自然科学基金重点项目(23JRRA882);甘肃省高校产业支持与指导项目(2024CYZC-23)

Distributed Optimization for Integrated Energy Based on Multi-agent Reinforcement Learning

Tao Caixia, Chen Naikun, Gao Fengyang, Zhang Jiangang   

  1. School of Automation and Electrical Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China
  • Received:2025-07-27 Revised:2025-10-16 Online:2026-02-18 Published:2026-02-11
  • Contact: Chen Naikun

摘要:

针对分布式综合能源系统协调优化面临的能量管理和隐私保护问题,提出基于多智能体近端策略优化算法的分布式协调优化策略。在MDP框架下构建能源管理模型;考虑电热异能特性,构建多区域双层交互机制;在集中训练-分散执行的框架下,利用同态加密避免协调过程中的隐私泄露问题同时精准量化个体贡献,缓解多智能体策略评估方差激增问题;在系统小时级调度中以日成本最低为目标函数寻找最优策略。仿真结果表明:该算法可以基于大量历史数据自适应训练,完成最优策略的推导,在满足工程约束条件下同步缩减各区域的运营成本。

关键词: 综合能源, 多区域能量管理, 多智能体算法, 深度强化学习, 分布式优化

Abstract:

To address the energy management and privacy preservation problems faced by the coordinated optimization of distributed integrated energy systems, a distributed coordinated optimization strategy based on the multi-agent proximal policy optimization algorithm was proposed. An energy management model was established under the MDP framework; the electrical and thermal heterogeneous energy characteristics were considered; a multi-region two-layer interaction mechanism was constructed. Under the framework of centralized training and decentralized execution, homomorphic encryption was utilized to avoid privacy leakage during the coordination process, while accurately quantifying individual contributions to mitigate the problem of variance explosion in multi-agent policy evaluation. In the hourly scheduling of the system, the minimum daily cost was taken as the objective function to search for the optimal strategy. The simulation results show that the proposed algorithm can perform adaptive training based on a large amount of historical data and complete the derivation of an optimal strategy, which can simultaneously reduce the operating cost of each region while satisfying the engineering constraints.

Key words: integrated energy, multi-region energy management, multi-agent algorithm, deep reinforcement learning, distributed optimization

中图分类号: