Public and private communication have been drastically changed by Online Social Networks, which offer an abundance of data for studying societal trends on global concerns like immigration. However, because of tweets' informal tone, linguistic diversity, and shortness, evaluating immigration-related content on social media platforms like X (previously called “Twitter”) presents unique challenges. To address these issues, we propose a multilingual data processing and classification pipeline that uses Machine Learning techniques to identify and analyze tweets about immigration. To this end, we investigate several models, such as Support Vector Machine, SetFit, Naive Bayes, Logistic Regression, and Neural Networks and we as-sess their performance on a multilingual dataset. Our results highlight that SetFit achieves the highest F1-score of 0.9, significantly outperforming oth-er models like Naive Bayes (F1: 0.74). Our research could offer valuable insights to policymakers and social scientists to enhance their processes to monitor, analyze, and respond to real-time discussions on immigration and other societal issues. |
*** Title, author list and abstract as submitted during Camera-Ready version delivery. Small changes that may have occurred during processing by Springer may not appear in this window.