22^nd AIAI 2026, 16 - 19 July 2026, Chania, Crete, Greece

Toward Dynamic Evaluation of Factual Numeric Recall in Large Language Models via Next Prime Identification

Çulha Davut

Abstract:

Large Language Models (LLMs) have achieved major breakthroughs in natu-ral language processing; however, their ability to generalize factual abstrac-tion, especially in non-linguistic domains, remains insufficiently explored. This paper introduces Prime Number Prompt Power (PNPP), a dynamic, prompt-based evaluation framework designed to assess LLMs’ internaliza-tion of numeric facts. It uses prime numbers because their irregular distribu-tion and the absence of a deterministic formula make them an ideal choice for testing generalization. PNPP challenges models to identify the next prime number following a given value across exponentially increasing intervals and defines a quantitative threshold (PNPPₓ), representing the largest interval range in which a model maintains a specified accuracy level. Experiments across leading LLMs demonstrate PNPP’s ability to distinguish numeric recall strengths, revealing differences in generalization patterns under increasing task complexity. As a lightweight, dataset-free, and scalable benchmark, PNPP complements traditional NLP evaluations and offers a principled ap-proach to measuring factual knowledge in LLMs. Empirical results show that PNPP differentiates models based on their numeric recall capabilities, with high-performing models maintaining accuracy over broader intervals. The framework’s ability to isolate factual knowledge makes it valuable for benchmarking, model selection, and guiding future training strategies for AI systems. Compared to static or language-oriented benchmarks, PNPP intro-duces a dynamically scalable approach that directly evaluates factual numer-ic recall, an ability traditional benchmarks do not capture. By using progres-sively expanding intervals, PNPP uncovers differences in generalization in-visible in conventional evaluations, offering a more robust and contamina-tion-resistant measure of factual knowledge in LLMs.

*** Title, author list and abstract as submitted during Camera-Ready version delivery. Small changes that may have occurred during processing by Springer may not appear in this window.

22nd AIAI 2026, 16 - 19 July 2026, Chania, Crete, Greece

Toward Dynamic Evaluation of Factual Numeric Recall in Large Language Models via Next Prime Identification

Çulha Davut

Abstract:

22^nd AIAI 2026, 16 - 19 July 2026, Chania, Crete, Greece