! -- Paper: 108 -->
|Filtering refers to the process of defining, detecting and correcting errors in a given dataset, to achieve system reliability and minimize the impact of errors in data analysis. Automated and accurate tools for data filtering and healing are crucial to ensure reliability of the system. This study aims to investigate statistical and machine-learning-based methodologies for data gaps healing and missing values imputation. In total, five models are being investigated individually, the well known ARIMA model, Linear and Polynomial Interpolation, General Regression and Facebook Prophet. The raw data that are used to evaluate these methods are simulated, and artificial data gaps are imposed randomly within the dataset to evaluate the univariate imputation performance of the aforementioned models based on Mean Squared Error and Mean Absolute Error. As expected the evaluation results illustrate the efficiency of highly elaborate machine-learning Facebook Prophet against more simple statistic ARIMA in expense of time and computational efforts. However, for Big Data univariate imputation applications the study findings suggest that a combination of ARIMA and Facebook Prophet, depending on the data gap size, could balance out the required computational resources while maintaining highly accurate imputation results.|
*** Title, author list and abstract as seen in the Camera-Ready version of the paper that was provided to Conference Committee. Small changes that may have occurred during processing by Springer may not appear in this window.