반응형

moe 3

Monet: Mixture of Monosemantic Experts for Transformers - 논문 리뷰

https://arxiv.org/abs/2412.04139 Monet: Mixture of Monosemantic Experts for TransformersUnderstanding the internal computations of large language models (LLMs) is crucial for aligning them with human values and preventing undesirable behaviors like toxic content generation. However, mechanistic interpretability is hindered by polysemanticityarxiv.org 처음에는 단순 MOE에 SAE를 붙인 줄 알았는데 MOE를 최대한 발전시켜 추가 ..

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer - 논문 리뷰

https://arxiv.org/abs/1701.06538 Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts LayerThe capacity of a neural network to absorb information is limited by its number of parameters. Conditional computation, where parts of the network are active on a per-example basis, has been proposed in theory as a way of dramatically increasing model capaarxiv.org 여기서는 모든 전문가를 다 연산하는 ..

Learning Factored Representations in a Deep Mixture of Experts - 논문 리뷰

https://arxiv.org/abs/1312.4314 Learning Factored Representations in a Deep Mixture of ExpertsMixtures of Experts combine the outputs of several "expert" networks, each of which specializes in a different part of the input space. This is achieved by training a "gating" network that maps each input to a distribution over the experts. Such models shoarxiv.org 기존 MOE는 단일 layer에서 MOE를 진행했다면 여기서는 Dee..

728x90
728x90