| The rapid growth of online platforms has generated vast amounts of user-generated text, increasing the need for effective emotion recognition methods. This paper presents a statistical comparison of two widely used transformer-based models, BERT and RoBERTa, for fine-grained emotion detection in technical online texts. Using data from Stack Overflow and the 28-class GoEmotions taxonomy, we analyze model behavior beyond standard predictive evaluation. The results indicate substantial agreement between the models ($\kappa = 0.669$). In particular, BERT produces consistently higher confidence scores, indicating overconfidence, whereas RoBERTa exhibits more balanced and better-calibrated predictions. Additionally, the relationship between detected emotions and user engagement is examined, showing no meaningful correlation between model confidence and comment popularity. Overall, the findings highlight that while both models are reliable at the sentiment level, important differences emerge in fine-grained emotion classification and confidence behavior, emphasizing the need for statistical and behavioral evaluation in domain-specific emotion analysis. |
*** Title, author list and abstract as submitted during Camera-Ready version delivery. Small changes that may have occurred during processing by Springer may not appear in this window.