In the context of Genomic and Precision Medicine, prediction problems are often characterized by a high imbalance between classes and Big Data. This requires specialized tools, as traditional Machine Learning approaches may struggle with big datasets and often fail to predict the minority class with unbalanced classification problems. In this work we present ParSMURF-NG, a High Performance Computing-oriented Machine Learning approach designed to scale well on big omics data. We measured its performance capabilities on three current-generation HPC systems and we showed its usefulness in the context of Genomic Medicine, providing a powerful model for the detection of pathogenic single nucleotide variants in the non-coding regions of the human genome. |
*** Title, author list and abstract as seen in the Camera-Ready version of the paper that was provided to Conference Committee. Small changes that may have occurred during processing by Springer may not appear in this window.