Abstract
This software is a framework for performing scalable data analysis over large-scale power grid data sets. The framework consists of a statistical analysis package, such as R, running in a robust parallel environment, such as over a Hadoop cluster. This analysis package is used to define rules that identify subsets of data of interest, for example bad data or data indicating events of interest. These rules can be combined in arbitrary ways, for example multiple rules may be required to remove all erroneous data from the original data set. These rules can also be translated to a more efficient encoding, such as a Java program. When events of interest are identified, they are classified within known event types, and the collection of event metadata and underlying data references are stored in a relational database. These higher level metadata descriptions of the events can then be used to quickly respond to queries from either users or other applications, or this information can be displayed in a visual format. This framework provides a unique ability to perform analysis over complete large-scale power grid data sets, such as the PMU or FFT data being generated by smartgrid deployments, as opposed to most traditional analysis techniques that operate over a subset of the data. This enables a more complete data analysis. We have used this framework to identify novel rules that identify erroneous data in PMU data sets. We have also developed rules for identifying events of interest such as generator trips and islanding events.
Application Number
13/928,108
Inventors
Gibson,Tara D
Hafen,Ryan P
Critchlow,Terence J
Market Sector
Energy Infrastructure
Data Sciences