반응형

소프트웨어 1193

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

https://arxiv.org/abs/2305.02301 Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model SizesDeploying large language models (LLMs) is challenging because they are memory inefficient and compute-intensive for practical applications. In reaction, researchers train smaller task-specific models by either finetuning with human labels or distilling usi..

Adapting Language Models to Compress Contexts

https://arxiv.org/abs/2305.14788 Adapting Language Models to Compress ContextsTransformer-based language models (LMs) are powerful and widely-applicable tools, but their usefulness is constrained by a finite context window and the expensive computational cost of processing long text documents. We propose to adapt pre-trained LMs intarxiv.org 이 논문도 llm의 제한된 context window, long context 리소스가 많이 드는..

LCIRC: A Recurrent Compression Approach for Efficient Long-form Context and Query Dependent Modeling in LLMs

https://arxiv.org/abs/2502.06139 LCIRC: A Recurrent Compression Approach for Efficient Long-form Context and Query Dependent Modeling in LLMsWhile large language models (LLMs) excel in generating coherent and contextually rich outputs, their capacity to efficiently handle long-form contexts is limited by fixed-length position embeddings. Additionally, the computational cost of processing long sa..

R1-Compress: Long Chain-of-Thought Compressionvia Chunk Compression and Search

https://arxiv.org/abs/2505.16838 R1-Compress: Long Chain-of-Thought Compression via Chunk Compression and SearchChain-of-Thought (CoT) reasoning enhances large language models (LLMs) by enabling step-by-step problem-solving, yet its extension to Long-CoT introduces substantial computational overhead due to increased token length. Existing compression approaches -- iarxiv.org압축하면서 생성하거나 그런 논문을 보고..

OSCAR: Online Soft Compression And Reranking

https://arxiv.org/abs/2504.07109 OSCAR: Online Soft Compression And RerankingRetrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by integrating external knowledge, leading to improved accuracy and relevance. However, scaling RAG pipelines remains computationally expensive as retrieval sizes grow. To address tarxiv.org이번에도 네이버 랩스 유럽에서 나온 token compression 관련 논문입니다.2025.08.1..

ACL 2026 main : towards privacy-preserving large language model: text-free inference through alignment and adaptation

원래는 모델, 코드까지 다 공개할 생각이었으나....안되니... 여기에 미리 작성되어있던 코드는 다 지우고 발표 자료나, 논문 올려놓겠습니다.https://arxiv.org/abs/2604.06831 Towards Privacy-Preserving Large Language Model: Text-free Inference Through Alignment and AdaptationCurrent LLM-based services typically require users to submit raw text regardless of its sensitivity. While intuitive, such practice introduces substantial privacy risks, as unauthorized..

Sequential Efficient LLM 논문 -3

https://aclanthology.org/2024.acl-long.536/ Dodo: Dynamic Contextual Compression for Decoder-only LMsGuanghui Qin, Corby Rosset, Ethan Chau, Nikhil Rao, Benjamin Van Durme. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024.aclanthology.orgacl 2024 long에 붙은 논문입니다. 기존 방법들(sparse attention, 커널 등)은 nlp에서 일관적인 효과가 나지 않거나, 대형 llm에 적용이..

Sequential Efficient LLM 논문 -2

https://arxiv.org/abs/2310.01732 Nugget: Neural Agglomerative Embeddings of TextEmbedding text sequences is a widespread requirement in modern language understanding. Existing approaches focus largely on constant-size representations. This is problematic, as the amount of information contained in text often varies with the length of tarxiv.org고정 길이 임베딩은 문장 길이와 정보량이 달라도 동일한 크기로 압축해야 해서 긴 텍스트에서 정보..

728x90
728x90