|Successive halving (SHA) has become popular for hyperparameter optimization since it often yields good results while consuming a considerably lower training budget than traditional brute-force algorithms. Nevertheless, SHA is known to suffer from the crossing curves problem. A configuration that performs poorly at the start may perform very well with more budget in later rounds, causing its loss curve to cross that of competing configurations. When this happens, there is a risk that promising configurations are discarded prematurely. In this paper, we propose a new variant of SHA which we name SASHA, that combines Simulated Annealing (SA) with SHA. SASHA discards configurations stochastically based on their performance relative to the best-performing configuration. This property allows the algorithm to make more informed decisions about the configurations to be discarded each round. We study the behavior of SASHA when optimizing a simple test function and a logistic regression model. Our results indicate that SASHA can effectively deal with the crossing curves problem.
*** Title, author list and abstract as seen in the Camera-Ready version of the paper that was provided to Conference Committee. Small changes that may have occurred during processing by Springer may not appear in this window.