21th AIAI 2025, 26 - 29 June 2025, Limassol, Cyprus

Benchmarking Vision Language Models on German Factual Data

Peinl René, Vincent Tischler

Abstract:

  Similar to LLMs, the development of VLMs is mainly driven by English da-tasets and models trained in English and Chinese language, whereas support for other languages, even those considered high-resource languages as Ger-man or French, is considerably less pronounced. In this work we present an analysis of open-weight VLMs on factual knowledge in German and English language. We disentangle the image-related aspects from the textual ones by analyzing accuracy in both prompt languages and images from German and international contexts. We found that in the two categories, VLMs struggle because of lacking perceptual knowledge of the image contents, whereas in the two other categories the tested models often are able to correctly identify the image contents according to the scientific name or English common name, but not according to the German name. The last two categories show no significant difference between English and German image contents or prompt language.  

*** Title, author list and abstract as submitted during Camera-Ready version delivery. Small changes that may have occurred during processing by Springer may not appear in this window.