Automated tag prediction for competitive programming problems is critical in organizing problem archives, enhancing personalized learning and efficient problem-solving. However, existing approaches are hindered by inconsistent labeling, severe tag imbalances, and reliance on large, general-purpose models with high computational costs. This work addresses these issues by integrating domain adaptation techniques into transformer-based architectures to generate high-quality, domain-specific embeddings. We first construct a comprehensive dataset from Codeforces contests, comprising over 7,000 problems with rich metadata. We employ a multi-tiered preprocessing strategy to mitigate tag imbalance and improve textual input quality. Our experiments evaluate three classifier architectures: a chain classifier that leverages inter-label dependencies, a One-vs-All framework with independent transformer-based classifiers, and a unified multi-label model. Comparative analyses against standard training paradigms and large models (including GPT4o, GPT4o-mini, and o1-mini) demonstrate that our domain-adapted approach achieves significant improvements, with gains in F1 and AUC of up to 5-10 percentage points in key subsets of tags. Our findings confirm that tailored domain adaptation and strategic data preprocessing can bridge the performance gap between generic large-scale language models and specialized systems for competitive programming. Our code and data are public at https://github.com/DinuGeorge0019/MLCP. |
*** Title, author list and abstract as submitted during Camera-Ready version delivery. Small changes that may have occurred during processing by Springer may not appear in this window.