공대생 도전 일지

Towards Compressive and Scalable RecurrentMemory

https://arxiv.org/abs/2602.11212 Towards Compressive and Scalable Recurrent MemoryTransformers face a quadratic bottleneck in attention when scaling to long contexts. Recent approaches introduce recurrent memory to extend context beyond the current window, yet these often face a fundamental trade-off between theoretical principles and parxiv.orgICLR 2026에 제출했지만 리젝했네요 여기서도 동일하게 transformer가 long..

인공지능/논문 리뷰 or 진행 2026.06.29

TurboQuant: Online Vector Quantization with Near-optimalDistortion Rate

https://arxiv.org/abs/2504.19874 TurboQuant: Online Vector Quantization with Near-optimal Distortion RateVector quantization, a problem rooted in Shannon's source coding theory, aims to quantize high-dimensional Euclidean vectors while minimizing distortion in their geometric structure. We propose TurboQuant to address both mean-squared error (MSE) and innerarxiv.org양자화는 잘 아는 분야가 아니라 조금 어렵겠지만.....

인공지능/논문 리뷰 or 진행 2026.06.25

Revising and Falsifying Sparse Autoencoder FeatureExplanations

https://neurips.cc/virtual/2025/loc/san-diego/poster/118303립스에 붙은 논문입니다 기존 SAE에서 feature 해석하는 방법은 다음과 같다.1. 특정 sae feature가 강하게 활성화 되는 문장을 모음2. LLM에게 이 feature가 무엇을 보고 활성화되는지 설명하라고 함3. 설명된 생성을 다시 simulator llm으로 평가 함그러나 이런 방법의 문제는 과도한 일반화를 진행하거나, 너무 넓은 범위를 말하기도 한다. top-activating examples만 보면 그럴 듯 하지만, 비슷한 문맥의 반례를 넣으면 쉽게 깨진다. => 기존 sae feature explanation은 맞는 예시를 잘 포함하지만, 틀린 예시를 배제하는 precision이 ..

인공지능/논문 리뷰 or 진행 2026.06.24

Do Sparse Autoencoders Identify Reasoning Features in Language Models?

https://arxiv.org/abs/2601.05679 Do Sparse Autoencoders Identify Reasoning Features in Language Models?We study how reliably sparse autoencoders (SAEs) support claims about reasoning-related internal features in large language models. We first give a stylized analysis showing that sparsity-regularized decoding can preferentially retain stable low-dimensionaarxiv.orgSAE가 찾아낸 reasoning feature가 실제..

인공지능/논문 리뷰 or 진행 2026.05.24

Retrieval from Within: An Intrinsic Capability of Attention-Based Models

https://arxiv.org/abs/2605.05806 Retrieval from Within: An Intrinsic Capability of Attention-Based ModelsRetrieval-augmented generation (RAG) typically treats retrieval and generation as separate systems. We ask whether an attention-based encoder-decoder can instead retrieve directly from its own internal representations. We introduce INTRA (INTrinsic Retrievarxiv.org기존 RAG에서는 Retriever 모델을 따로 쓰..

인공지능/논문 리뷰 or 진행 2026.05.21

Token Pruning in Multimodal Large Language Models: Are We Solving the Right Problem?

https://arxiv.org/abs/2502.11501 Token Pruning in Multimodal Large Language Models: Are We Solving the Right Problem?Multimodal large language models (MLLMs) have shown remarkable performance for cross-modal understanding and generation, yet still suffer from severe inference costs. Recently, abundant works have been proposed to solve this problem with token pruning, whiarxiv.org이번엔 멀티모달이라 그렇게 땡..

인공지능/논문 리뷰 or 진행 2026.05.20

Recursive Multi-Agent Systems

https://arxiv.org/abs/2604.25917 Recursive Multi-Agent SystemsRecursive or looped language models have recently emerged as a new scaling axis by iteratively refining the same model computation over latent states to deepen reasoning. We extend such scaling principle from a single model to multi-agent systems, and ask:arxiv.orgAgent끼리의 소통을 텍스트 기반 대화가 아닌 latent space 기반으로 진행 Hidden state를 Recursive..

인공지능/논문 리뷰 or 진행 2026.05.15

LIMO: Less is More for Reasoning

https://arxiv.org/abs/2502.03387 LIMO: Less is More for ReasoningWe challenge the prevailing assumption that complex reasoning in large language models (LLMs) necessitates massive training data. We demonstrate that sophisticated mathematical reasoning can emerge with only a few examples. Specifically, through simple suparxiv.orgreasoning model로 만들기 위해 필요한 것은 대규모 sft data가 아니라, 이미 pretrained 모델 내..

인공지능/논문 리뷰 or 진행 2026.05.14

s1: Simple test-time scaling

https://arxiv.org/abs/2501.19393 s1: Simple test-time scalingTest-time scaling is a promising new approach to language modeling that uses extra test-time compute to improve performance. Recently, OpenAI's o1 model showed this capability but did not publicly share its methodology, leading to many replication efforts.arxiv.org 이 논문은 RL 없이도 1000개의 reasoning trace로 sft하고, 추론 시 모델의 생각 길이를 강제로 조절하는 bu..

인공지능/논문 리뷰 or 진행 2026.05.14

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

https://arxiv.org/abs/2305.02301 Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model SizesDeploying large language models (LLMs) is challenging because they are memory inefficient and compute-intensive for practical applications. In reaction, researchers train smaller task-specific models by either finetuning with human labels or distilling usi..

인공지능/논문 리뷰 or 진행 2026.05.12

공대생 도전 일지

전체 글 1202

티스토리툴바

« 2026/07 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31