Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift on ShortScience.org

arxiv.org
arxiv-vanity.com
scholar.google.com

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Sergey Ioffe and Christian Szegedy
arXiv e-Print archive - 2015 via Local arXiv
Keywords: cs.LG
more

Summaries/Notes 8

[link] Summary by Martin Thoma 7 years ago

One problem of training deep networks is that the features of lower-layer networks change while the upper-layer networks have already been adjusted to the previous lower-layer features. The phenomenon of changing inputs while optimizing is called *internal covariate shift*.

Batch normalization is done at training time for each mini batch.

## Ideas

* Training converges faster, if input is whitened (zero means, unit variances, decorrelated).
* Normalization parameters have to be computed within the gradient calculation step to prevent the model from blowing up

## What Batch Normalization is

For a layer with $d$-dimensional input $x = (x^{(1)}, \dots, x^{(d)})$, we will normalize
each dimension 
$$\hat{x}^{(k)} = \frac{x^{(k)} - \mathbb{E}[x^{(k)}]}{\sqrt{Var[x^{(k)}]}}$$
where the expectation and the variance are computed over the training data set. This does *not* decorrelate the features, though.

Additionally, for each activation $x^{(k)}$ two paramters $\gamma^{(k)}, \beta^{(k)}$ are introduced which scale and shift the feature:

$$y^{(k)} = \gamma^{(k)} \cdot \hat{x}^{(k)} + \beta^{(k)}$$

Those two parameters (per feature) are learnable!

## Effect of Batch normalization

* Higher learning rates can be used
* Initialization is less important
* Acts as a regularizer, eliminating the need for dropout in some cases
* Faster training

## Datasets

* reaching 4.9% top-5 validation error (and 4.8% test error) on ImageNet classification

## Used by

* [Going Deeper with Convolutions](http://www.shortscience.org/paper?bibtexKey=journals/corr/SzegedyLJSRAEVR14)
* [Deep Residual Learning for Image Recognition](http://www.shortscience.org/paper?bibtexKey=journals/corr/HeZRS15#martinthoma)

## See also

* [other summaries](http://www.shortscience.org/paper?bibtexKey=conf/icml/IoffeS15)

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private