https://arxiv.org/abs/2302.04023 A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and InteractivityThis paper proposes a framework for quantitatively evaluating interactive LLMs such as ChatGPT using publicly available data sets. We carry out an extensive technical evaluation of ChatGPT using 23 data sets covering 8 different common NLP application taskarx..