22nd AIAI 2026, 16 - 19 July 2026, Chania, Crete, Greece

CamemBERT Based Architecture for Automated Essay Scoring in French Language Testing

Faye Raymond Maurice , Abdallah Wejden, Mouchnino Julien, Casanova Dominique

Abstract:

  Automated Essay Scoring (AES) systems have gained increasing attention for their ability to provide consistent, scalable, and immediate feedback in educational assessments. In this paper, we present a novel AES system tailored for the Test d’Évaluation de Français (TEF), leveraging a fine-tuned CamemBERT model almanach/camembertv2-base, a French-language transformer pretrained on large corpora. Our approach combines deep contextual embeddings with a multi-output regression head designed to predict four linguistic sub-scores: coherence, lexical richness, syntactic accuracy, and communicative adequacy, which are then linearly combined to produce the final score. The system was trained and evaluated on a real-world dataset of 42,200 TEF essays from 21,110 candidates, achieving an R² score of 0.64 on held-out test data and surpassing human inter-rater agreement on exact CEFR-level classification (66.3% vs. 54.5%). This approach shows promising potential for automated scoring on high-stakes French language proficiency exams, while addressing fairness and scalability challenges in language assessment.  

*** Title, author list and abstract as submitted during Camera-Ready version delivery. Small changes that may have occurred during processing by Springer may not appear in this window.