基于多智能体强化学习的综合能源分布式优化

doi:10.16182/j.issn1004731x.joss.25-0728

摘要/Abstract

摘要：

针对分布式综合能源系统协调优化面临的能量管理和隐私保护问题，提出基于多智能体近端策略优化算法的分布式协调优化策略。在MDP框架下构建能源管理模型；考虑电热异能特性，构建多区域双层交互机制；在集中训练-分散执行的框架下，利用同态加密避免协调过程中的隐私泄露问题，同时精准量化个体贡献，缓解多智能体策略评估方差激增问题；在系统小时级调度中以日成本最低为目标函数寻找最优策略。仿真结果表明：该算法可以基于大量历史数据自适应训练，完成最优策略的推导，在满足工程约束条件下同步缩减各区域的运营成本。

关键词: 综合能源, 多区域能量管理, 多智能体算法, 深度强化学习, 分布式优化

Abstract:

To address the energy management and privacy preservation problems faced by the coordinated optimization of distributed integrated energy systems, a distributed coordinated optimization strategy based on the multi-agent proximal policy optimization algorithm was proposed. An energy management model was established under the MDP framework; the electrical and thermal heterogeneous energy characteristics were considered; a multi-region two-layer interaction mechanism was constructed. Under the framework of centralized training and decentralized execution, homomorphic encryption was utilized to avoid privacy leakage during the coordination process, while accurately quantifying individual contributions to mitigate the problem of variance explosion in multi-agent policy evaluation. In the hourly scheduling of the system, the minimum daily cost was taken as the objective function to search for the optimal strategy. The simulation results show that the proposed algorithm can perform adaptive training based on a large amount of historical data and complete the derivation of an optimal strategy, which can simultaneously reduce the operating cost of each region while satisfying the engineering constraints.

Key words: integrated energy, multi-region energy management, multi-agent algorithm, deep reinforcement learning, distributed optimization

中图分类号:

TP391

陶彩霞,陈乃焜,高锋阳等 . 基于多智能体强化学习的综合能源分布式优化[J]. 系统仿真学报, 2026, 38(2): 476-487.

Tao Caixia,Chen Naikun,Gao Fengyang,et al . Distributed Optimization for Integrated Energy Based on Multi-agent Reinforcement Learning[J]. Journal of System Simulation, 2026, 38(2): 476-487.

图/表 15

图1

图2

图3

图4

图5

图6

图7

图8

图9

图10

图11

表1

表2

表3

表4

参考文献 25

[1]	袁越, 苗安康, 吴涵, 等. 低碳综合能源系统研究框架与关键问题研究综述[J]. 高电压技术, 2024, 50(9): 4019-4036.
	Yuan Yue, Miao Ankang, Wu Han, et al. Review of the Research Framework and Key Issues for Low-carbon Integrated Energy System[J]. High Voltage Engineering, 2024, 50(9): 4019-4036.
[2]	Li Yuanzheng, He Shangyang, Li Yang, et al. Federated Multiagent Deep Reinforcement Learning Approach via Physics-Informed Reward for Multimicrogrid Energy Management[J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(5): 5902-5914.
[3]	骆钊, 卢涛, 马瑞, 等. 可再生能源配额制下多园区综合能源系统优化调度[J]. 电力自动化设备, 2021, 41(4): 8-14.
	Luo Zhao, Lu Tao, Ma Rui, et al. Optimal Scheduling of Multi-park Integrated Energy System Under Renewable Portfolio Standard[J]. Electric Power Automation Equipment, 2021, 41(4): 8-14.
[4]	贾东梨, 刘科研, 任昭颖, 等. 计及能源优先级的综合能源系统优化运行调度[J]. 系统仿真学报, 2024, 36(12): 2771-2781.
	Jia Dongli, Liu Keyan, Ren Zhaoying, et al. Optimal Operation Scheduling of Integrated Energy System Considering Energy Priority[J]. Journal of System Simulation, 2024, 36(12): 2771-2781.
[5]	Chen Wei, Liu Lu, Liu Guoping. Privacy-preserving Distributed Economic Dispatch of Microgrids: A Dynamic Quantization-based Consensus Scheme with Homomorphic Encryption[J]. IEEE Transactions on Smart Grid, 2023, 14(1): 701-713.
[6]	Ma Zhoujun, Zhou Yizhou, Zheng Yuping, et al. Distributed Robust Optimal Dispatch of Regional Integrated Energy Systems Based on ADMM Algorithm with Adaptive Step Size[J]. Journal of Modern Power Systems and Clean Energy, 2024, 12(3): 852-862.
[7]	罗清局, 朱继忠. 基于改进交替方向乘子法的电-气综合能源系统优化调度[J]. 电工技术学报, 2024, 39(9): 2797-2809.
	Luo Qingju, Zhu Jizhong. Optimal Dispatch of Integrated Electricity and Gas System Based on Modified Alternating Direction Method of Multipliers[J]. Transactions of China Electrotechnical Society, 2024, 39(9): 2797-2809.
[8]	Zhang Zongnan, Kudashev Sergey Fedorovich. Optimal Operation of Multi-integrated Energy System Based on Multi-level Nash Multi-stage Robust[J]. Applied Energy, 2024, 358: 122557.
[9]	田海东, 何山, 艾纯玉, 等. 计及能源交易下基于纳什议价模型的多微网合作博弈运行优化策略[J]. 电力系统保护与控制, 2024, 52(6): 29-41.
	Tian Haidong, He Shan, Ai Chunyu, et al. Optimization Strategy for Cooperative Game Operation of Multi-microgrids Based on the Nash Bargaining Model Considering Energy Trading[J]. Power System Protection and Control, 2024, 52(6): 29-41.
[10]	Li Jifeng, He Xingtang, Li Weidong, et al. Low-carbon Optimal Learning Scheduling of the Power System Based on Carbon Capture System and Carbon Emission Flow Theory[J]. Electric Power Systems Research, 2023, 218: 109215.
[11]	Li Yang, Bu Fanjin, Li Yuanzheng, et al. Optimal Scheduling of Island Integrated Energy Systems Considering Multi-uncertainties and Hydrothermal Simultaneous Transmission: A Deep Reinforcement Learning Approach[J]. Applied Energy, 2023, 333: 120540.
[12]	Hou Shengren, Edgar Mauricio Salazar, Vergara Pedro P, et al. Performance Comparison of Deep RL Algorithms for Energy Systems Optimal Scheduling[C]//2022 IEEE PES Innovative Smart Grid Technologies Conference Europe (ISGT-Europe). Piscataway: IEEE, 2022: 1-6.
[13]	董雷, 杨子民, 乔骥, 等. 基于分层约束强化学习的综合能源多微网系统优化调度[J]. 电工技术学报, 2024, 39(5): 1436-1453.
	Dong Lei, Yang Zimin, Qiao Ji, et al. Optimal Scheduling of Integrated Energy Multi-microgrid System Based on Hierarchical Constraint Reinforcement Learning[J]. Transactions of China Electrotechnical Society, 2024, 39(5): 1436-1453.
[14]	范培潇, 柯松, 杨军, 等. 基于改进多智能体深度确定性策略梯度的多微网负荷频率协同控制策略[J]. 电网技术, 2022, 46(9): 3504-3514.
	Fan Peixiao, Ke Song, Yang Jun, et al. Load Frequency Coordinated Control Strategy of Multi-microgrid Based on Improved MA-DDPG[J]. Power System Technology, 2022, 46(9): 3504-3514.
[15]	Samadi Esmat, Badri Ali, Ebrahimpour Reza. Decentralized Multi-agent Based Energy Management of Microgrid Using Reinforcement Learning[J]. International Journal of Electrical Power & Energy Systems, 2020, 122: 106211.
[16]	Khodadadi Amin, Adinehpour Sara, Sepehrzad Reza, et al. Data-driven Hierarchical Energy Management in Multi-integrated Energy Systems Considering Integrated Demand Response Programs and Energy Storage System Participation Based on MADRL Approach[J]. Sustainable Cities and Society, 2024, 103: 105264.
[17]	Xia Yang, Xu Yan, Feng Xue. Hierarchical Coordination of Networked-microgrids Toward Decentralized Operation: A Safe Deep Reinforcement Learning Method[J]. IEEE Transactions on Sustainable Energy, 2024, 15(3): 1981-1993.
[18]	Xia Qinqin, Wang Qianggang, Zou Yao, et al. Physical Model-assisted Deep Reinforcement Learning for Energy Management Optimization of Industrial Electric-hydrogen Coupling System with Hybrid Energy Storage[J]. Journal of Energy Storage, 2024, 100, Part A: 113477.
[19]	刘小峰, 徐全桂, 金燕, 等. 噪声干扰环境下的深度强化学习故障诊断方法[J]. 电子测量与仪器学报, 2024, 38(12): 145-154.
	Liu Xiaofeng, Xu Quangui, Jin Yan, et al. Deep Reinforcement Learning Fault Diagnosis Method Under Noisy Interference Environment[J]. Journal of Electronic Measurement and Instrumentation, 2024, 38(12): 145-154.
[20]	刘永超, 谭思超, 李桐, 等. 基于深度强化学习的反应堆热工系统运行温度智能控制方法研究[J]. 核动力工程, 2024, 45(增2): 197-205.
	Liu Yongchao, Tan Sichao, Li Tong, et al. Research on Intelligent Control Method of Operating Temperature of Reactor Thermal System Based on Deep Reinforcement Learning[J]. Nuclear Power Engineering, 2024, 45(S2): 197-205.
[21]	高冠中, 杨胜春, 郭晓蕊, 等. 深度强化学习在含分布式柔性资源的电网优化调度中的应用研究综述[J]. 中国电机工程学报, 2024, 44(16): 6385-6403.
	Gao Guanzhong, Yang Shengchun, Guo Xiaorui, et al. A Review of Research on the Application of Deep Reinforcement Learning in Optimization Dispatch of Power Grids with Distributed Flexible Resources[J]. Proceedings of the CSEE, 2024, 44(16): 6385-6403.
[22]	Joshi A, Capezza S, Alhaji A, et al. Survey on AI and Machine Learning Techniques for Microgrid Energy Management Systems[J]. IEEE/CAA Journal of Automatica Sinica, 2023, 10(7): 1513-1529.
[23]	江昌旭, 刘晨曦, 林铮, 等. 基于深度强化学习的电力系统暂态稳定控制策略研究综述[J]. 高电压技术, 2023, 49(12): 5171-5186.
	Jiang Changxu, Liu Chenxi, Lin Zheng, et al. Review of Power System Transient Stability Control Strategies Based on Deep Reinforcement Learning[J]. High Voltage Engineering, 2023, 49(12): 5171-5186.
[24]	李海峰, 杨宏安, 盛梓茂, 等. 基于MAPPO的多无人机协同分布式动态任务分配[J]. 控制与决策, 2025, 40(5): 1429-1437.
	Li Haifeng, Yang Hongan, Sheng Zimao, et al. Multi-UAV Collaborative Distributed Dynamic Task Allocation Based on MAPPO[J]. Control and Decision, 2025, 40(5): 1429-1437.
[25]	方宝富, 余婷婷, 王浩, 等. 稀疏奖励场景下基于适应性状态近似的多智能体强化学习[J]. 机器人, 2024, 46(6): 663-671, 682.
	Fang Baofu, Yu Tingting, Wang Hao, et al. Multi-agent Reinforcement Learning Based on Adaptive State Approximation in Sparse Reward Scenarios[J]. Robot, 2024, 46(6): 663-671, 682.

算法	最大值/元	最小值/元	均值/元	标准差
MAPPO	370 072	284 972	309 508	1.28
PPO	412 563	276 501	351 470	2.27
TD3	449 012	254 786	378 012	2.31
MATD3	399 201	305 210	329 087	1.76

类型	平均日成本/元	训练时间/s	计算时间/s
MAPPO	304 083	1 123	0.013 2
ADMM	304 904	0	25
MILP	305 664	0	32

波动范围/%	本文方法	随机优化法
0	309 508	334 156
5	312 912.6	343 846.5
15	315 079.1	349 193
20	316 007.7	356 544.5