Group Normalization on ShortScience.org

arxiv.org
arxiv-vanity.com
scholar.google.com

Group Normalization
Yuxin Wu and Kaiming He
arXiv e-Print archive - 2018 via Local arXiv
Keywords: cs.CV, cs.LG
more

Summaries/Notes 3

[link] Summary by David Stutz 5 years ago

Wu and He propose group normalization as alternative to batch normalization. Instead of computing the statistics used for normalization based on the current mini-batch, group normalization computes these statistics per instance but in groups of channels (for convolutional layers). Specifically, given activations $x_i$ with $i = (i_N, i_C, i_H, i_W)$ indexing along batch size, channels, height and width, batch normalization computes

$\mu_i = \frac{1}{|S|}\sum_{k \in S} x_k$ and $\sigma_i = \sqrt{\frac{1}{|S|} \sum_{k \in S} (x_k - \mu_i)^2 + \epsilon}$

with the set $S$ holds all indices for a specific channel (i.e. across samples, height and width). For group normalization, in contrast, $S$ holds all indices of the current instance and group of channels. Meaning the statistics are computed across height, width and the current group of channels. Here, all channels can be divided into groups arbitrarily. In the paper, on ImageNet, groups of $32$ channels are used. Then, Figure 1 shows that for a batch size of 32, group normalization performs en-par with batch normalization – although the validation error is slightly larger. This is attributed to the stochastic element of batch normalization that leads to regularization. Figure 2 additionally shows the influence of the batch size of batch normalization and group normalization.

https://i.imgur.com/lwP5ycw.jpg
Figure 1: Training and validation error for different normalization schemes on ImageNet.

https://i.imgur.com/0c3CnEX.jpg
Figure 2: Validation error for different batch sizes.

Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/).

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private