The National Payments Corporation of India (NPCI) faces two major challenges: Increasing Unified Payments Interface (UPI) Frauds and their large, imbalanced dataset. It uses an ensemble of AI models to combat UPI fraud. As the methods of fraudsters evolve, models need to be trained/retrained on new data. With frequent retraining on various algorithms, frequent Hyperparameter Optimization (HPO) becomes a challenging task. For models with many tunable hyperparameters, the dimensionality of the hyperparameter space grows, ruling out traditional methods like Grid Search. This paper discusses the use of the Advantage Actor Critic (A2C) algorithm, a Reinforcement Learning (RL) algorithm to fine tune the hyperparameters of an XGBoost Classifier model. The performance was compared with Random Search (RS), Bayesian Optimization (BO), Particle Swarm Optimization (PSO), and Canonical Genetic Algorithm (CGA). A2C gave the best results by achieving the 80% F1-score target within 4.01 hours (40 % faster than CGA, the next fastest algorithm.) |
*** Title, author list and abstract as submitted during Camera-Ready version delivery. Small changes that may have occurred during processing by Springer may not appear in this window.