Semi-supervised learning can be a promising approach in expediting the process of annotating medical images. In this paper, we use diffusion models to learn visual representations from multi-modal medical images in an unsupervised setting. These learned representations are then employed for the challenging downstream task of brain tumor segmentation. To avoid feature selection when using pixel-level classifiers, we propose fine-tuning the noise predictor network for semantic segmentation. We compare these methods against a supervised baseline over a varying number of training samples and evaluate their performance on a substantially larger test set. Our results show that, with less than 20 training samples, all methods outperform the supervised baseline across all tumor regions. Additionally, we present a practical use-case for patient-level tumor segmentation using limited supervision. The code we used and our trained diffusion model are publicly available. |
*** Title, author list and abstract as seen in the Camera-Ready version of the paper that was provided to Conference Committee. Small changes that may have occurred during processing by Springer may not appear in this window.