18th AIAI 2022, 17 - 20 June 2022, Greece

An Empirical Study on Anomaly Detection Algorithms for Extremely Imbalanced Datasets

Gonçalo Fontes, Luis Miguel Matos, Arthur Matta, Andre Pilastri, Paulo Cortez


  Anomaly detection attempts to identify abnormal events that deviate from normality. Since such events are often rare, data related to this domain is usually imbalanced. In this paper, we compare diverse preprocessing and Machine Learning (ML) state-of-the-art algorithms that can be adopted within this anomaly detection context. These include two unsupervised learning algorithms, namely Isolation Forests (IF) and deep dense AutoEncoders (AE), and two supervised learning approaches, namely Random Forest and an Automated ML (AutoML) method. Several empirical experiments were conducted by adopting seven extremely imbalanced public domain datasets. Overall, the IF and AE unsupervised methods obtained competitive anomaly detection results, which also have the advantage of not requiring labeled data.  

*** Title, author list and abstract as seen in the Camera-Ready version of the paper that was provided to Conference Committee. Small changes that may have occurred during processing by Springer may not appear in this window.