系统仿真学报 ›› 2018, Vol. 30 ›› Issue (4): 1473-1481.doi: 10.16182/j.issn1004731x.joss.201804032

• 仿真应用工程 • 上一篇    下一篇

泄漏电流数据的Spark-KNN并行模式识别方法

李莉, 朱永利, 宋亚奇   

  1. 华北电力大学控制与计算机工程学院,河北 保定 071003
  • 收稿日期:2016-05-11 修回日期:2016-07-15 出版日期:2018-04-08 发布日期:2019-01-04
  • 作者简介:李莉(1980-),女,重庆,硕士,讲师,研究方向为现代信号处理方法在电力系统故障诊断等方面的应用。
  • 基金资助:
    国家自然科学基金(51677072),河北省自然科学基金(A2016502001),中央高校基本科研业务费专项资金(2018MS074)

Parallel Pattern Recognition of Leak Current Data Using Spark-KNN

Li Li, Zhu Yongli, Song Yaqi   

  1. School of Control and Computer Engineering, North China Electric Power University, Baoding 071003, China
  • Received:2016-05-11 Revised:2016-07-15 Online:2018-04-08 Published:2019-01-04

摘要: 随着智能电网的快速发展,电网设备状态监测数据呈指数级增长,逐渐构成电网设备状态监测大数据。传统计算架构已无法满足计算性能需求。结合Spark大数据处理技术和阿里云E-MapReduce云计算平台,提出电网设备状态监测大数据并行模式识别方法,旨在提升电网设备在线监测系统对短时间内骤增的报警监测数据快速批量分析的能力。设计了基于Spark的并行化k最近邻分类算法(k-Nearest Neighbor,KNN)Spark-KNN,实现了海量绝缘子泄漏电流数据的并行模式识别。实验结果表明,Spark-KNN的平均性能是Hadoop MapReduce实现的2.97倍,获得了最高8.8倍的加速比,更适合执行电力设备监测大数据的实时处理任务。

关键词: 电网设备, 在线监测, 大数据, Spark

Abstract: With the rapid development of smart grid, the status monitoring data of power grid equipment increase exponentially and gradually form the big data. Traditional computing architectures are no longer to meet the demand of computing performance. This paper explores how Spark and Cloud computing can accelerate performance of missive insulator leak current data pattern recognition. The Parallel KNN (k-Nearest Neighbor) algorithm is designed and implemented by using Spark and Aliyun E-MapReduce cloud computing platform. The results from experiments show that the performance of Spark-KNN is 2.97 times of MapReduce-KNN and gains acceleration of 8.8 times. The experimental results confirm that Spark is more suitable for real time data processing tasks than MapReduce.

Key words: power grid equipment, online monitoring, big data, Spark

中图分类号: