22nd AIAI 2026, 16 - 19 July 2026, Chania, Crete, Greece

Compact vs Mid-Scale LLMs for Customer Support: A Deployment-Oriented Benchmark of Unified Classification and Response Generation

Khan Talhat, Ryan Conor

Abstract:

  Large Language Models (LLMs) are transforming customer support automation, yet deployment requires balancing response quality with the computational costs of fine-tuning and inference. This work investigates whether a compact model (TinyLlama-1.1B) can rival mid-scale models (Llama-2-7B, Mistral-7B) when adapted for unified customer support tasks. We evaluate a joint-task setting where a single model simultaneously generates agent responses and predicts intent and category labels for routing. To establish a rigorous baseline, we compare these against larger non-fine-tuned models, including Llama-3.1-8B, Falcon-40B, and Llama-2-70B. Experiments across two datasets, general customer service (CSD) and heterogeneous IT support (CITD), demonstrate that domain-specific fine-tuning is indispensable; even 70B parameter base models fail to maintain structured routing and produce generic replies. After parameter-efficient fine-tuning (QLoRA), TinyLlama achieves generation quality (BLEU/ROUGE-L) competitive with 7B models and approaches their routing accuracy on the CSD dataset, while training up to 5× faster on a single GPU. Furthermore, an author-led error analysis identifies critical failure modes, such as procedural omissions and corrupted outputs, that automated metrics fail to capture. Our results suggest that for specialised support domains, compact fine-tuned LLMs offer a high-performance, resource-efficient alternative to larger architectures, making them ideal for practical, cost-sensitive deployments.  

*** Title, author list and abstract as submitted during Camera-Ready version delivery. Small changes that may have occurred during processing by Springer may not appear in this window.