Impressive advances in the modern insurance industry in- corporate data-driven methods to predict insurance claims for optimizing the premium setting. In recent literature, many studies examine the use of Machine Learning techniques as the most promising approach to predict insurance claims. However, such techniques do not quantify the uncertainty involved in predictions, especially in heavily imbalanced datasets where underlying classifiers are biased towards the majority class. In this work, we propose a novel machine learning framework called Conformal Prediction, combined with the XGBoost classifier that provides valid confidence guarantees for both claimants and non-claimants individually. We examine its performance in a large-scale imbalanced dataset comprising 100,000 drivers where 95.73% have not reported a claim. Our experimental results demonstrate that the proposed approach produces empirically valid and unbiased outputs. |
*** Title, author list and abstract as submitted during Camera-Ready version delivery. Small changes that may have occurred during processing by Springer may not appear in this window.