https://arxiv.org/abs/2504.04150
Reasoning on Multiple Needles In A Haystack
The Needle In A Haystack (NIAH) task has been widely used to evaluate the long-context question-answering capabilities of Large Language Models (LLMs). However, its reliance on simple retrieval limits its effectiveness. To address this limitation, recent s
arxiv.org
https://aclanthology.org/2025.naacl-long.267/
Multilingual Needle in a Haystack: Investigating Long-Context Behavior of Multilingual Large Language Models
Amey Hengle, Prasoon Bajpai, Soham Dan, Tanmoy Chakraborty. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2025.
aclanthology.org
https://arxiv.org/abs/2503.00353
U-NIAH: Unified RAG and LLM Evaluation for Long Context Needle-In-A-Haystack
Recent advancements in Large Language Models (LLMs) have expanded their context windows to unprecedented lengths, sparking debates about the necessity of Retrieval-Augmented Generation (RAG). To address the fragmented evaluation paradigms and limited cases
arxiv.org
https://aclanthology.org/2025.emnlp-main.1497/
Sequential-NIAH: A Needle-In-A-Haystack Benchmark for Extracting Sequential Needles from Long Contexts
Yifei Yu, Qian-Wen Zhang, Lingfeng Qiao, Di Yin, Fang Li, Jie Wang, Chen Zeng Xi, Suncong Zheng, Xiaolong Liang, Xing Sun. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025.
aclanthology.org
'인공지능 > 논문 리뷰 or 진행' 카테고리의 다른 글
| Privacy AI 관련 조사 13 (0) | 2026.02.03 |
|---|---|
| Privacy AI 관련 조사 12 (0) | 2026.02.02 |
| ALIENLM: ALIENIZATION OF LANGUAGE FORPRIVACY-PRESERVING API INTERACTION WITHLLMS (0) | 2026.01.28 |
| Privacy AI 관련 조사 11 (0) | 2026.01.27 |
| Privacy AI 관련 조사 10 (0) | 2026.01.26 |