19th AIAI 2023, 14 - 17 June 2023, León, Spain

Explaining Machine Learning-Based Feature Selection of IDS for IoT and CPS Devices

Sesan Akintade, Seongtae Kim, Kaushik Roy


  In training machine learning and artificial intelligence models for Intrusion Detection Systems (IDS), feature selection plays a critical role in evaluating the prediction performance and explainability of the trained model. The feature selection in designing the IDS is often hindered by the volume, variety, and veracity of the data generated from Internet-of-Things (IoT) and Cyber-Physical Systems (CPS) devices. In this paper, we explored selecting the best subset of features to reduce the feature space of high-dimensional datasets and thereby improve performance, explainability, and computing time. We incorporated the feature selection method of permutation importance in XGB models and prediction explainability methods, such as SHAP and LIME. Using two publicly available IDS datasets, NSL-KDD and CCID-V1, our feature selection-based XBG model for the NSL-KDD data reduced features from 42 to 20 with an AUC score of 0.8751 from the previous 0.8530 with 60% improvement in training time. A similar model for the CCID-V1 data reduced the features from 82 to 22 and achieved an AUC of 0.9999 with a 46% improvement in computing time. We also observed that SHAP and LIME explanations of the prediction showed consistent results in selecting important features. Our study demonstrated that the feature selection achieved an improvement in performance and explainability along with lower training time, which increases the usability of our model for the design of IDS.  

*** Title, author list and abstract as seen in the Camera-Ready version of the paper that was provided to Conference Committee. Small changes that may have occurred during processing by Springer may not appear in this window.