Public and private communication have been drastically changed by Online Social Networks, which offer an abundance of data for studying societal trends on global concerns like immigration. However, because of tweets' informal tone, linguistic diversity, and shortness, evaluating immigration-related content on platforms like X (previously called “Twitter”) presents unique challenges. To address these issues, we propose a multilingual data processing and classification pipeline that uses Machine Learning techniques to identify and analyze tweets about immigration. To this end, we evaluated the performance of traditional and state-of-the-art algorithms and models on a multilingual dataset. Our results demonstrate that the transformer-based SetFit models achieve the highest F1 scores, with the top-performing model reaching 0.96, significantly outperforming traditional methods such as Naive Bayes. Our research could offer valuable insights to policymakers and social scientists to enhance their processes to monitor, analyze, and respond to real-time discussions on immigration and other societal issues. |
*** Title, author list and abstract as submitted during Camera-Ready version delivery. Small changes that may have occurred during processing by Springer may not appear in this window.