We present a methodology to automatically parallelize outlier detection ensemble models using directed acyclic graphs embedding the MapReduce paradigm. The DAGs are built implicitly such that naive sequential computations can be transformed into efficient parallel computations without changing the underlying implementation. We show that the proposed parallelization approach is an effective strategy to combat the computational complexity inherent to ensemble learning models, leading to a near-optimal speedup in a theoretical setting, and a substantial speedup in a practical setting. |
*** Title, author list and abstract as seen in the Camera-Ready version of the paper that was provided to Conference Committee. Small changes that may have occurred during processing by Springer may not appear in this window.