https://arxiv.org/abs/2006.16668 GShard: Scaling Giant Models with Conditional Computation and Automatic ShardingNeural network scaling has been critical for improving the model quality in many real-world machine learning applications with vast amounts of training data and compute. Although this trend of scaling is affirmed to be a sure-fire approach for better modelarxiv.org 이 논문은 대규모 신경망을..