| Hate speech and misogyny detection using Natural Language Processing (NLP) tools has gained significant attention in recent years. However, limited research has addressed these challenges in code-mixed and low-resource settings. In multilingual societies, a substantial proportion of online communication occurs in hybrid linguistic forms such as Hinglish (Hindi-English), where inconsistent transliteration, mixed grammatical structures, and culturally nuances complicate automated detection. This study introduces a novel expert-annotated dataset for binary and contextual misogyny detection in code-mixed Hindi-English text collected from Reddit. We conduct a comparative evaluation of traditional machine learning models, alongside transformer-based multilingual Bidirectional Encoder Representations from Transformers (mBERT). Experimental results demonstrate that mBERT significantly outperforms baseline machine learning models in binary misogyny detection, with a macro-F1 score 0.8214, and mBERT embeddings with Logistic Regression (LR) performs marginally better with a macro-F1 score of 0.8218. Statistical testing through paired t-tests further validates that the results were significant. |
*** Title, author list and abstract as submitted during Camera-Ready version delivery. Small changes that may have occurred during processing by Springer may not appear in this window.