mixup: Beyond Empirical Risk Minimization
Hongyi Zhang
and
Moustapha Cisse
and
Yann N. Dauphin
and
David Lopez-Paz
arXiv e-Print archive - 2017 via Local arXiv
Keywords:
cs.LG, stat.ML
First published: 2017/10/25 (7 years ago) Abstract: Large deep neural networks are powerful, but exhibit undesirable behaviors
such as memorization and sensitivity to adversarial examples. In this work, we
propose mixup, a simple learning principle to alleviate these issues. In
essence, mixup trains a neural network on convex combinations of pairs of
examples and their labels. By doing so, mixup regularizes the neural network to
favor simple linear behavior in-between training examples. Our experiments on
the ImageNet-2012, CIFAR-10, CIFAR-100, Google commands and UCI datasets show
that mixup improves the generalization of state-of-the-art neural network
architectures. We also find that mixup reduces the memorization of corrupt
labels, increases the robustness to adversarial examples, and stabilizes the
training of generative adversarial networks.
Very efficient data augmentation method. Linear-interpolate training set x and y randomly at every epoch.
```python
for (x1, y1), (x2, y2) in zip(loader1, loader2):
lam = numpy.random.beta(alpha, alpha)
x = Variable(lam * x1 + (1. - lam) * x2)
y = Variable(lam * y1 + (1. - lam) * y2)
optimizer.zero_grad()
loss(net(x), y).backward()
optimizer.step()
```
- ERM (Empirical Risk Minimization) is $\alpha = 0$ version of mixup, i.e. not using mixup.
- Reduces the memorization of corrupt labels.
- Increases robustness to adversarial examples.
- Stabilizes the training of GAN.