https://aclanthology.org/2024.emnlp-main.811/
LLM Task Interference: An Initial Study on the Impact of Task-Switch in Conversational History
Akash Gupta, Ivaxi Sheth, Vyas Raina, Mark Gales, Mario Fritz. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024.
aclanthology.org
https://arxiv.org/abs/2502.05167
NoLiMa: Long-Context Evaluation Beyond Literal Matching
Recent large language models (LLMs) support long contexts ranging from 128K to 1M tokens. A popular method for evaluating these capabilities is the needle-in-a-haystack (NIAH) test, which involves retrieving a "needle" (relevant information) from a "haysta
arxiv.org
https://arxiv.org/abs/2501.17399
MultiChallenge: A Realistic Multi-Turn Conversation Evaluation Benchmark Challenging to Frontier LLMs
We present MultiChallenge, a pioneering benchmark evaluating large language models (LLMs) on conducting multi-turn conversations with human users, a crucial yet underexamined capability for their applications. MultiChallenge identifies four categories of c
arxiv.org
https://arxiv.org/abs/2505.17123
MTR-Bench: A Comprehensive Benchmark for Multi-Turn Reasoning Evaluation
Recent advances in Large Language Models (LLMs) have shown promising results in complex reasoning tasks. However, current evaluations predominantly focus on single-turn reasoning scenarios, leaving interactive tasks largely unexplored. We attribute it to t
arxiv.org
https://arxiv.org/abs/2403.06447
CoRAL: Collaborative Retrieval-Augmented Large Language Models Improve Long-tail Recommendation
The long-tail recommendation is a challenging task for traditional recommender systems, due to data sparsity and data imbalance issues. The recent development of large language models (LLMs) has shown their abilities in complex reasoning, which can help to
arxiv.org
'인공지능 > 논문 리뷰 or 진행' 카테고리의 다른 글
| Privacy AI 관련 조사 13 (0) | 2026.02.03 |
|---|---|
| Privacy AI 관련 조사 12 (0) | 2026.02.02 |
| Multi-turn, Long-context Benchmark 논문 3 (0) | 2026.01.31 |
| ALIENLM: ALIENIZATION OF LANGUAGE FORPRIVACY-PRESERVING API INTERACTION WITHLLMS (0) | 2026.01.28 |
| Privacy AI 관련 조사 11 (0) | 2026.01.27 |