Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models
Sergey Ioffe
arXiv e-Print archive - 2017 via Local arXiv
Keywords:
cs.LG
First published: 2017/02/10 (7 years ago) Abstract: Batch Normalization is quite effective at accelerating and improving the
training of deep models. However, its effectiveness diminishes when the
training minibatches are small, or do not consist of independent samples. We
hypothesize that this is due to the dependence of model layer inputs on all the
examples in the minibatch, and different activations being produced between
training and inference. We propose Batch Renormalization, a simple and
effective extension to ensure that the training and inference models generate
the same outputs that depend on individual examples rather than the entire
minibatch. Models trained with Batch Renormalization perform substantially
better than batchnorm when training with small or non-i.i.d. minibatches. At
the same time, Batch Renormalization retains the benefits of batchnorm such as
insensitivity to initialization and training efficiency.