|Evaluating the quality of machine generated open-ended texts is a long-standing challenge in Natural Language Processing (NLP). Even though there have been dramatic advancements in the machine learning technologies that propelled the research work concerning Natural Language Generation (NLG), a subdivision of NLP that focuses on text generation, a promising and widely adopted automatic evaluation technique for NLG tasks is yet to be developed. In this paper, we propose leveraging conversational Large Language Models (LLMs) as automatic evaluators for several open-ended NLG tasks. Our experiments with a recently released conversational LLM named ChatGPT demonstrate the viability of our proposal.
*** Title, author list and abstract as seen in the Camera-Ready version of the paper that was provided to Conference Committee. Small changes that may have occurred during processing by Springer may not appear in this window.