Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity on ShortScience.org

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
William Fedus and Barret Zoph and Noam Shazeer
arXiv e-Print archive - 2021 via Local arXiv
Keywords: cs.LG, cs.AI
more

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private