https://arxiv.org/abs/2403.10949 SelfIE: Self-Interpretation of Large Language Model EmbeddingsHow do large language models (LLMs) obtain their answers? The ability to explain and control an LLM's reasoning process is key for reliability, transparency, and future model developments. We propose SelfIE (Self-Interpretation of Embeddings), a frameworkarxiv.org 이 논문은 Sparse Autoencoder(SAE)와는 다르게 추가..