반응형

전체 글 1198

Retrieval from Within: An Intrinsic Capability of Attention-Based Models

https://arxiv.org/abs/2605.05806 Retrieval from Within: An Intrinsic Capability of Attention-Based ModelsRetrieval-augmented generation (RAG) typically treats retrieval and generation as separate systems. We ask whether an attention-based encoder-decoder can instead retrieve directly from its own internal representations. We introduce INTRA (INTrinsic Retrievarxiv.org기존 RAG에서는 Retriever 모델을 따로 쓰..

Token Pruning in Multimodal Large Language Models: Are We Solving the Right Problem?

https://arxiv.org/abs/2502.11501 Token Pruning in Multimodal Large Language Models: Are We Solving the Right Problem?Multimodal large language models (MLLMs) have shown remarkable performance for cross-modal understanding and generation, yet still suffer from severe inference costs. Recently, abundant works have been proposed to solve this problem with token pruning, whiarxiv.org이번엔 멀티모달이라 그렇게 땡..

Recursive Multi-Agent Systems

https://arxiv.org/abs/2604.25917 Recursive Multi-Agent SystemsRecursive or looped language models have recently emerged as a new scaling axis by iteratively refining the same model computation over latent states to deepen reasoning. We extend such scaling principle from a single model to multi-agent systems, and ask:arxiv.orgAgent끼리의 소통을 텍스트 기반 대화가 아닌 latent space 기반으로 진행 Hidden state를 Recursive..

LIMO: Less is More for Reasoning

https://arxiv.org/abs/2502.03387 LIMO: Less is More for ReasoningWe challenge the prevailing assumption that complex reasoning in large language models (LLMs) necessitates massive training data. We demonstrate that sophisticated mathematical reasoning can emerge with only a few examples. Specifically, through simple suparxiv.orgreasoning model로 만들기 위해 필요한 것은 대규모 sft data가 아니라, 이미 pretrained 모델 내..

s1: Simple test-time scaling

https://arxiv.org/abs/2501.19393 s1: Simple test-time scalingTest-time scaling is a promising new approach to language modeling that uses extra test-time compute to improve performance. Recently, OpenAI's o1 model showed this capability but did not publicly share its methodology, leading to many replication efforts.arxiv.org 이 논문은 RL 없이도 1000개의 reasoning trace로 sft하고, 추론 시 모델의 생각 길이를 강제로 조절하는 bu..

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

https://arxiv.org/abs/2305.02301 Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model SizesDeploying large language models (LLMs) is challenging because they are memory inefficient and compute-intensive for practical applications. In reaction, researchers train smaller task-specific models by either finetuning with human labels or distilling usi..

Adapting Language Models to Compress Contexts

https://arxiv.org/abs/2305.14788 Adapting Language Models to Compress ContextsTransformer-based language models (LMs) are powerful and widely-applicable tools, but their usefulness is constrained by a finite context window and the expensive computational cost of processing long text documents. We propose to adapt pre-trained LMs intarxiv.org 이 논문도 llm의 제한된 context window, long context 리소스가 많이 드는..

LCIRC: A Recurrent Compression Approach for Efficient Long-form Context and Query Dependent Modeling in LLMs

https://arxiv.org/abs/2502.06139 LCIRC: A Recurrent Compression Approach for Efficient Long-form Context and Query Dependent Modeling in LLMsWhile large language models (LLMs) excel in generating coherent and contextually rich outputs, their capacity to efficiently handle long-form contexts is limited by fixed-length position embeddings. Additionally, the computational cost of processing long sa..

R1-Compress: Long Chain-of-Thought Compressionvia Chunk Compression and Search

https://arxiv.org/abs/2505.16838 R1-Compress: Long Chain-of-Thought Compression via Chunk Compression and SearchChain-of-Thought (CoT) reasoning enhances large language models (LLMs) by enabling step-by-step problem-solving, yet its extension to Long-CoT introduces substantial computational overhead due to increased token length. Existing compression approaches -- iarxiv.org압축하면서 생성하거나 그런 논문을 보고..

728x90
728x90