Multi-turn, Long-context Benchmark 논문 4

인공지능/논문 리뷰 or 진행

Multi-turn, Long-context Benchmark 논문 4

이게될까 2026. 2. 4. 02:51

728x90

https://aclanthology.org/2024.emnlp-main.811/

LLM Task Interference: An Initial Study on the Impact of Task-Switch in Conversational History

Akash Gupta, Ivaxi Sheth, Vyas Raina, Mark Gales, Mario Fritz. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024.

aclanthology.org

https://arxiv.org/abs/2502.05167

NoLiMa: Long-Context Evaluation Beyond Literal Matching

Recent large language models (LLMs) support long contexts ranging from 128K to 1M tokens. A popular method for evaluating these capabilities is the needle-in-a-haystack (NIAH) test, which involves retrieving a "needle" (relevant information) from a "haysta

arxiv.org

https://arxiv.org/abs/2501.17399

MultiChallenge: A Realistic Multi-Turn Conversation Evaluation Benchmark Challenging to Frontier LLMs

We present MultiChallenge, a pioneering benchmark evaluating large language models (LLMs) on conducting multi-turn conversations with human users, a crucial yet underexamined capability for their applications. MultiChallenge identifies four categories of c

arxiv.org

https://arxiv.org/abs/2505.17123

MTR-Bench: A Comprehensive Benchmark for Multi-Turn Reasoning Evaluation

Recent advances in Large Language Models (LLMs) have shown promising results in complex reasoning tasks. However, current evaluations predominantly focus on single-turn reasoning scenarios, leaving interactive tasks largely unexplored. We attribute it to t

arxiv.org

https://arxiv.org/abs/2403.06447

CoRAL: Collaborative Retrieval-Augmented Large Language Models Improve Long-tail Recommendation

The long-tail recommendation is a challenging task for traditional recommender systems, due to data sparsity and data imbalance issues. The recent development of large language models (LLMs) has shown their abilities in complex reasoning, which can help to

arxiv.org

저작자표시 비영리 (새창열림)

'인공지능 > 논문 리뷰 or 진행' 카테고리의 다른 글

Privacy AI 관련 조사 13 (0)	2026.02.03
Privacy AI 관련 조사 12 (0)	2026.02.02
Multi-turn, Long-context Benchmark 논문 3 (0)	2026.01.31
ALIENLM: ALIENIZATION OF LANGUAGE FORPRIVACY-PRESERVING API INTERACTION WITHLLMS (0)	2026.01.28
Privacy AI 관련 조사 11 (0)	2026.01.27

현재글Multi-turn, Long-context Benchmark 논문 4

NLP, AI, XAI에 관심있는 공대생의 일기장...?

Today :
Yesterday :

공대생 도전 일지