November 7, 2023
Journal Article
Synthetic data generation for machine learning model training for energy theft scenarios using cosimulation
Abstract
Technical and non-technical losses in distribution circuits result in significant economic costs to power utilities. One type of non-technical loss is energy theft by various means including illegal tapping of feeders, tampering or bypassing the meter, and billing fraud. These losses are usually hard to detect, and remain undetected for long periods of time. Machine learning models have proved effective in detecting these conditions, but rely on the availability of large good quality training data sets. The problem is exacerbated by the imbalanced nature of data related to these conditions - energy theft, though costly, is very rare. The available data sets generally have very few samples of actual theft occurring, and most of the data pertains to normal operation. Such data sets with a very limited distribution of positive samples is generally not suitable to train machine learning models. In this paper, we present an overview of energy theft detection techniques, the challenges with their data needs, and the limitations of current techniques to bridge the data limitations. We propose a co-simulation framework to generate reliable training data for machine learning algorithms for theft detection. We present an example scenario and build a machine learning model to detect certain kinds of energy theft.Published: November 7, 2023