
https://arxiv.org/abs/2206.10498 PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about ChangeGenerating plans of action, and reasoning about change have long been considered a core competence of intelligent agents. It is thus no surprise that evaluating the planning and reasoning capabilities of large language models (LLMs) has become a hot topic..