| Fraud detection in financial transactions presents a unique set of challenges including extreme class imbalance, evolving fraud patterns (concept drift), high-dimensional feature spaces, and the critical need to balance detection rates against false positives. This study introduces HASTE (Hierarchical Attention-based Stacked Ensemble), a novel deep learning architecture that combines multiple base models through a hierarchical attention mechanism with temporal awareness. HASTE dynamically learns to weight base model predictions based on input features, enabling it to prioritize high-confidence fraud predictions while adapting to concept drift through temporal attention. We evaluate HASTE against Random Forest, SVM, and Logistic Regression on four benchmark datasets: BankSim (11% fraud rate), FraudGuard (22% fraud with concept drift), IEEE-CIS (1% fraud, 433 features), and PaySim (1% fraud). Comprehensive evaluation using statistical metrics, McNemar's significance testing, and business impact analysis (assuming $1,000 fraud cost, $50 false alarm cost) reveals that HASTE consistently outperforms all baselines. Key results include a 91.3% relative F1 improvement on IEEE-CIS (0.1739 vs. 0.0909), 88.8% improvement on PaySim (0.3333 vs. 0.1765), and statistically significant advantages over SVM on three of four datasets (p<0.05). Most importantly, HASTE delivers exceptional business value with ROI improvements of 56% on BankSim (364.5 vs. 233.1), 93% on FraudGuard (132.8 vs. 68.7), 105% on IEEE-CIS (39.0 vs. 19.0), and 144% on PaySim (11.0 vs. 4.5). These results demonstrate that HASTE provides a robust solution for fraud detection across diverse scenarios, including concept drift and severe class imbalance. The source code and implementation details are publicly available at: https://github.com/avokhuese/HASTE |
*** Title, author list and abstract as submitted during Camera-Ready version delivery. Small changes that may have occurred during processing by Springer may not appear in this window.