https://arxiv.org/abs/1312.4314 Learning Factored Representations in a Deep Mixture of ExpertsMixtures of Experts combine the outputs of several "expert" networks, each of which specializes in a different part of the input space. This is achieved by training a "gating" network that maps each input to a distribution over the experts. Such models shoarxiv.org 기존 MOE는 단일 layer에서 MOE를 진행했다면 여기서는 Dee..