The ISIC archive is an open dermoscopy dataset containing thousands of images so that new Deep Learning skin classifiers can be trained. ISIC Challenges attract many participants to build a model that will bring the best performance to the ISIC test dataset. The question is whether such a model has consistent behavior in different datasets and other clinical images. In this work, we build and study the performance of a classifier trained in the ISIC 2019 dataset in three different cases: the performance during the cross-validation training process, the performance in the separate ISIC 2019 test dataset, and dermoscopy images taken from the SYGGROS skin disease hospital. The results show a stable performance compared to the metric F1 score for the categories in which there are more than 3000 images in the training dataset. In addition, we identify the factors that make it difficult to transfer and use classifiers from a competitive to a clinical setting. |
*** Title, author list and abstract as seen in the Camera-Ready version of the paper that was provided to Conference Committee. Small changes that may have occurred during processing by Springer may not appear in this window.