Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer on ShortScience.org

arxiv.org
arxiv-vanity.com
scholar.google.com

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Noam Shazeer and Azalia Mirhoseini and Krzysztof Maziarz and Andy Davis and Quoc Le and Geoffrey Hinton and Jeff Dean
arXiv e-Print archive - 2017 via Local arXiv
Keywords: cs.LG, cs.CL, cs.NE, stat.ML
more

Summaries/Notes 1

[link] Summary by Martin Thoma 7 years ago

A NLP paper.

> "conditional computation, achieving greater than 1000x improvements in model capacity with
only minor losses in computational efficiency on modern GPU clusters. We introduce
a Sparsely-Gated Mixture-of-Experts layer (MoE), consisting of up to
thousands of feed-forward sub-networks"

## Evaluation
* 1 billion word language modeling benchmark
* 100 billion word google news corpus

Your comment: