Sparse Autoencoder 시작
SAE를 통해 LLM의 데이터를 변경, 조작해보자가 시작되었습니다!!!!
https://transformer-circuits.pub/2024/scaling-monosemanticity/
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
Authors Adly Templeton*, Tom Conerly*, Jonathan Marcus, Jack Lindsey, Trenton Bricken, Brian Chen, Adam Pearce, Craig Citro, Emmanuel Ameisen, Andy Jones, Hoagy Cunningham, Nicholas L Turner, Callum McDougall, Monte MacDiarmid, Alex Tamkin, Esin Durmus, Tr
transformer-circuits.pub
여기서 시작된 아이디어 입니다.!
금문교에서 시작된 SAE!
https://jbloomaus.github.io/SAELens/
SAE Lens
SAELens The SAELens training codebase exists to help researchers: Train sparse autoencoders. Analyse sparse autoencoders and neural network internals. Generate insights which make it easier to create safe and aligned AI systems. Please note these docs are
jbloomaus.github.io
이 아저씨 블로그를 통해서 배우려고 합니다.
https://github.com/jbloomAus/SAELens
GitHub - jbloomAus/SAELens: Training Sparse Autoencoders on Language Models
Training Sparse Autoencoders on Language Models. Contribute to jbloomAus/SAELens development by creating an account on GitHub.
github.com
깃 허브 보면 튜토리얼까지 잘 되어 있더라고여
https://huggingface.co/jbloom/Gemma-2b-Residual-Stream-SAEs
jbloom/Gemma-2b-Residual-Stream-SAEs · Hugging Face
Gemma 2b Residual Stream SAEs. This is a "quick and dirty" SAE release to unblock researchers. These SAEs have not been extensively studied or characterized. However, I will try to update the readme here when I add SAEs here to reflect what I know about th
huggingface.co
허깅 페이스에 모델도 공개되어 있습니다.
이건 Residual에서 뽑은 SAE로 데이터 크기가 많이 줄어든다고 하네요
밑에는 나머지 자료들 좀 저장용으로
https://huggingface.co/google/gemma-scope
google/gemma-scope · Hugging Face
Gemma Scope: This is a landing page for Gemma Scope, a comprehensive, open suite of sparse autoencoders for Gemma 2 9B and 2B. Sparse Autoencoders are a "microscope" of sorts that can help us break down a model’s internal activations into the underlying
huggingface.co
Neuronpedia
Open Interpretability Platform
www.neuronpedia.org
여기에 모델들 공개해 놓는다고 합니다.
https://github.com/EleutherAI/sae
GitHub - EleutherAI/sae: Sparse autoencoders
Sparse autoencoders. Contribute to EleutherAI/sae development by creating an account on GitHub.
github.com
SAE 하나 더 있는데 이건 그렇게 친절해 보이지 않아서...
https://transformer-circuits.pub/2024/april-update/index.html
Circuits Updates - April 2024
We report a number of developing ideas on the Anthropic interpretability team, which might be of interest to researchers working actively in this space. Some of these are emerging strands of research where we expect to publish more on in the coming months
transformer-circuits.pub
https://www.anthropic.com/research#interpretability
Research
Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
www.anthropic.com
여기 업데이트도 잘 확인해보기
https://www.salesforceairesearch.com/crm-benchmark
Generative AI Benchmark for CRM | Salesforce AI Research
Powering the world's smartest CRM by embedding state-of-the-art deep learning technology into the Salesforce Platform.
www.salesforceairesearch.com
벤치마크 점수 확인해서 작은데 좋은 모델로 뽑아 쓰기
https://huggingface.co/collections/HuggingFaceTB/smollm-6695016cad7167254ce15966
🪐 SmolLM - a HuggingFaceTB Collection
A series of smol LLMs: 135M, 360M and 1.7B. We release base and Instruct models as well as the training corpus and some WebGPU demos
huggingface.co
아마 이 모델이 될 것 같기도 합니다...