22nd AIAI 2026, 16 - 19 July 2026, Chania, Crete, Greece

Bloom Filter Encoding for Machine Learning

Cartmell John, Cardei Mihaela, Ionut Ionut

Abstract:

  We present a method that uses a Bloom filter transform to preprocess data for machine learning. Each sample is encoded into a compact bit-array representation using hash-based encoding, producing a fixed-length feature space that reduces memory usage and obfuscates original feature values. The encoding does not rely on keyed hashing; however, a key can optionally be used to control the mapping and would be required to reproduce the representation. We evaluate the approach on six datasets spanning text, time-series, tabular, and image domains: SMS Spam Collection, ECG200, Adult 50K, CDC Diabetes, MNIST, and Fashion MNIST. Four classifiers are considered: Extreme Gradient Boosting, Deep Neural Networks, Convolutional Neural Networks, and Logistic Regression. Results show that models trained on Bloom filter encodings achieve performance comparable to models trained on raw data or standard dimensionality reduction techniques across several datasets, while providing consistent memory savings. These findings suggest that Bloom filter encodings can serve as an efficient, general-purpose pre-processing representation that preserves useful similarity structure for learning tasks while providing a degree of data obfuscation.  

*** Title, author list and abstract as submitted during Camera-Ready version delivery. Small changes that may have occurred during processing by Springer may not appear in this window.