21th AIAI 2025, 26 - 29 June 2025, Limassol, Cyprus

LLM-Based Automated Hallucination Detection in Multilingual Customer Service RAG Applications

Patel Nikilkumar, Mouratidis Haralambos, Ng Kai Zhi Kenneth

Abstract:

  With their strong human-level capabilities and rapid advancements in quality and affordability, closed Large Language Models (LLMs) are increasingly being integrated into real-world solutions. However, hallucinations in LLM-generated responses contribute to misinformation, deception, and mistrust, ultimately compromising user safety and the reliability of these solutions, even when external knowledge is incorporated through Retrieval-Augmented Generation (RAG). The challenges surrounding the effectiveness and generalization of current LLM-based hallucination detection methods further exacerbate this issue. Focusing on the multilingual customer service domain, we explore LLM-based automatic hallucination detection methods—using an LLM as a judge—for closed LLMs and assess their effectiveness in practical Question Answering RAG (Response Augmentation Generation) applications. We conduct a systematic evaluation of multiple hallucination detection methods in a controlled setting, leveraging our manually labelled real-world dataset. Ultimately, we find that while existing detection methods perform well on well-structured public datasets, they encounter significant challenges when applied to complex real-world scenarios like ours due to operational challenges and model limitations.  

*** Title, author list and abstract as submitted during Camera-Ready version delivery. Small changes that may have occurred during processing by Springer may not appear in this window.