mixup: Beyond Empirical Risk Minimization
Hongyi Zhang
and
Moustapha Cisse
and
Yann N. Dauphin
and
David Lopez-Paz
arXiv e-Print archive - 2017 via Local arXiv
Keywords:
cs.LG, stat.ML
First published: 2017/10/25 (7 years ago) Abstract: Large deep neural networks are powerful, but exhibit undesirable behaviors
such as memorization and sensitivity to adversarial examples. In this work, we
propose mixup, a simple learning principle to alleviate these issues. In
essence, mixup trains a neural network on convex combinations of pairs of
examples and their labels. By doing so, mixup regularizes the neural network to
favor simple linear behavior in-between training examples. Our experiments on
the ImageNet-2012, CIFAR-10, CIFAR-100, Google commands and UCI datasets show
that mixup improves the generalization of state-of-the-art neural network
architectures. We also find that mixup reduces the memorization of corrupt
labels, increases the robustness to adversarial examples, and stabilizes the
training of generative adversarial networks.