Layer Normalization on ShortScience.org

arxiv.org
arxiv-vanity.com
scholar.google.com

Layer Normalization
Jimmy Lei Ba and Jamie Ryan Kiros and Geoffrey E. Hinton
arXiv e-Print archive - 2016 via Local arXiv
Keywords: stat.ML, cs.LG
more

Summaries/Notes 2

[link] Summary by Denny Britz 8 years ago

TLDR; The authors propose a new normalization scheme called "Layer Normalization" that works especially well for recurrent networks. Layer Normalization is similar to Batch Normalization, but only depends on a single training case. As such, it's well suited for variable length sequences or small batches. In Layer Normalization each hidden unit shares the same normalization term. The authors show through experiments that Layer Normalization converges faster, and sometimes to better solutions, than batch- or unnormalized RNNs. Batch normalization still performs better for CNNs.

Your comment: