인공지능/논문 리뷰 or 진행

Multi-turn, Long-context Benchmark 논문 3

이게될까 2026. 1. 31. 02:50
728x90
728x90

https://arxiv.org/abs/2504.04150

 

Reasoning on Multiple Needles In A Haystack

The Needle In A Haystack (NIAH) task has been widely used to evaluate the long-context question-answering capabilities of Large Language Models (LLMs). However, its reliance on simple retrieval limits its effectiveness. To address this limitation, recent s

arxiv.org

 

 

https://aclanthology.org/2025.naacl-long.267/

 

Multilingual Needle in a Haystack: Investigating Long-Context Behavior of Multilingual Large Language Models

Amey Hengle, Prasoon Bajpai, Soham Dan, Tanmoy Chakraborty. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2025.

aclanthology.org

 

https://arxiv.org/abs/2503.00353

 

U-NIAH: Unified RAG and LLM Evaluation for Long Context Needle-In-A-Haystack

Recent advancements in Large Language Models (LLMs) have expanded their context windows to unprecedented lengths, sparking debates about the necessity of Retrieval-Augmented Generation (RAG). To address the fragmented evaluation paradigms and limited cases

arxiv.org

 

https://aclanthology.org/2025.emnlp-main.1497/

 

Sequential-NIAH: A Needle-In-A-Haystack Benchmark for Extracting Sequential Needles from Long Contexts

Yifei Yu, Qian-Wen Zhang, Lingfeng Qiao, Di Yin, Fang Li, Jie Wang, Chen Zeng Xi, Suncong Zheng, Xiaolong Liang, Xing Sun. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025.

aclanthology.org

 

 

 

 

 

728x90