This article examines the use of image augmentation techniques to improve icon detection in mobile interfaces, a critical task due to the small size of graphical user interface (GUI) elements and the insufficiency of comprehensive datasets. It evaluates whether diversifying the dataset or using specific augmentation methods alone can enhance detection performance. The study specifically compares the performance of two models, Faster R-CNN and YOLOv8, in detecting these elements, highlighting the challenges and potential solutions in automating complex process interactions through improved icon recognition. The findings from our computational experiments reveal that the application of classical image augmentation methods to enhance dataset diversity significantly improves the performance of these models. Remarkably, such augmentation techniques are capable of yielding results that are comparable to, or even exceed, those obtained from training the models on considerably larger datasets. It is particularly noteworthy that models demonstrate superior performance when they are initially provided with a substantial volume of annotations, surpassing the outcomes associated with models trained on extensive data collections. Among the various augmentation techniques evaluated, image rotation emerged as the most effective in enhancing the performance of both models. Nonetheless, it was observed that the Faster R-CNN model consistently outperformed the YOLOv8 model across all experiments. |
*** Title, author list and abstract as seen in the Camera-Ready version of the paper that was provided to Conference Committee. Small changes that may have occurred during processing by Springer may not appear in this window.