| Facial expression recognition is crucial in computer vision and finds applications across various domains. In this paper, we propose a self-supervised learning approach for precise facial expression recognition. Our approach leverages diffusion models, specifically the Classification and Regression Diffusion (CARD) model. To enhance the discriminative capability of our model, we integrate the Dual-Direction Attention Module (DDAM) that captures long-range dependencies and extracts robust feature representations. The DDAM generates attention maps in two orientations, improving the model’s ability to focus on distinct facial regions critical for expression analysis. Furthermore, we capitalize on unlabelled data by using the simple contrastive learning framework of self-supervised learning (SSL) to extract meaningful features. To evaluate performance, we conduct extensive experiments on the FER2013 dataset, comparing our results with existing benchmarks. The findings reveal significant performance improvements, achieving 67.4% accuracy on the FER2013 dataset. The quantitative results demonstrate the efficacy of our proposed SSL-based model in achieving accurate and robust facial expression recognition. | 
*** Title, author list and abstract as submitted during Camera-Ready version delivery. Small changes that may have occurred during processing by Springer may not appear in this window.