'2025/03/11 글 목록

Agent Benchmark 빠르게 보기 - TravelPlanner, REALM-Bench, PlanBench

https://arxiv.org/abs/2206.10498 PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about ChangeGenerating plans of action, and reasoning about change have long been considered a core competence of intelligent agents. It is thus no surprise that evaluating the planning and reasoning capabilities of large language models (LLMs) has become a hot topic..

인공지능/논문 리뷰 or 진행 02:58:47

일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

공대생 도전 일지

2025/03/11 1

티스토리툴바