TY - JOUR
T1 - Anomaly detection in wind turbine SCADA data for power curve cleaning
AU - Morrison, Rory
AU - Liu, Xiaolei
AU - Lin, Zi
N1 - Funding Information: The authors thank the anonymous data provider for providing the SCADA data for both wind farms. This research was funded by the EPSRC Doctoral Training Partnership ( EP/R513222/1 ENG ).
PY - 2022/1/1
Y1 - 2022/1/1
N2 - Wind turbine power curve cleaning, by way of removing curtailment, stoppage, and other anomalies, is an essential step in making raw data useable for further analysis, such as determining turbine performance, site characteristics, or improving forecasting models. Typically, data comes as SCADA (Supervisory Control and Data Acquisition) data, so contains not only environmental and turbine performance data but also the control action imposed on the turbine by the operator. Many different anomaly detection (AD) methods have been proposed to clean power curves; however, few papers have explored filtering explicit and obvious anomalies from the SCADA prior to running AD. This paper actively explores this filtering impact by comparing the performances of 4 different AD methods with/without filtering. These are: iForest, Local Outlier Factor, Gaussian Mixture Models, and k-Nearest Neighbours. Each approach is evaluated in terms of prediction error, data removal rates, and ability to maintain the underlying wind statistical characteristics. The results show the effectiveness of filtering with every technique showing improvement compared to its unfiltered counterpart. Furthermore, Gaussian Mixture Models are shown to provide favourable accuracy whilst maintaining wind variability, however, with the wide range of performances of methods, a user's choice may be different depending on their needs.
AB - Wind turbine power curve cleaning, by way of removing curtailment, stoppage, and other anomalies, is an essential step in making raw data useable for further analysis, such as determining turbine performance, site characteristics, or improving forecasting models. Typically, data comes as SCADA (Supervisory Control and Data Acquisition) data, so contains not only environmental and turbine performance data but also the control action imposed on the turbine by the operator. Many different anomaly detection (AD) methods have been proposed to clean power curves; however, few papers have explored filtering explicit and obvious anomalies from the SCADA prior to running AD. This paper actively explores this filtering impact by comparing the performances of 4 different AD methods with/without filtering. These are: iForest, Local Outlier Factor, Gaussian Mixture Models, and k-Nearest Neighbours. Each approach is evaluated in terms of prediction error, data removal rates, and ability to maintain the underlying wind statistical characteristics. The results show the effectiveness of filtering with every technique showing improvement compared to its unfiltered counterpart. Furthermore, Gaussian Mixture Models are shown to provide favourable accuracy whilst maintaining wind variability, however, with the wide range of performances of methods, a user's choice may be different depending on their needs.
KW - Anomaly detection
KW - Data cleaning
KW - Power curve
KW - Wind turbine
UR - http://www.scopus.com/inward/record.url?scp=85120882025&partnerID=8YFLogxK
U2 - 10.1016/j.renene.2021.11.118
DO - 10.1016/j.renene.2021.11.118
M3 - Article
AN - SCOPUS:85120882025
SN - 0960-1481
VL - 184
SP - 473
EP - 486
JO - Renewable Energy
JF - Renewable Energy
ER -