21th AIAI 2025, 26 - 29 June 2025, Limassol, Cyprus

Handling Anonymized Non-Numerical Features in Data using Transformations for Regression Models

Mathias Selvine George, Akmal Muhammad Uzair, Asif Saara, Knollmeyer Simon, Koval Leonid, Grossmann Daniel

Abstract:

  Manufacturing environments involve a lot of testing processes that assess different product features that are critical for evaluating the quality of the product. As the requirements of quality checks and balances increase, so does the time a product spends on the testing bench for approval or rejection. To overcome the dilemma of excluding features in analysis while not compromising on pre-defined standards, it is essential to identify which tests or features are redundant or irrelevant in prediction problems, especially when the tests are anonymized for privacy or confidentiality purposes. To achieve this objective, this paper presents a comparison analysis of three types of transformations for anonymized non-numerical data while handling higher dimensions through data filtering based on thresholds. This includes a novel transformation of non-numerical data to segmented and decimal-encoded samples. A public dataset is used for this purpose, containing only categorical and binary data and observed quality check time in seconds. The implementation and results show that model efficiency without domain knowledge of the tests is not impossible, with over 50% R2 Score on a completely unseen test set of equal size as the training set.  

*** Title, author list and abstract as submitted during Camera-Ready version delivery. Small changes that may have occurred during processing by Springer may not appear in this window.