First published: 2018/04/10 (6 years ago) Abstract: Performance-critical machine learning models should be robust to input
perturbations not seen during training. Adversarial training is a method for
improving a model's robustness to some perturbations by including them in the
training process, but this tends to exacerbate other vulnerabilities of the
model. The adversarial training framework has the effect of translating the
data with respect to the cost function, while weight decay has a scaling
effect. Although weight decay could be considered a crude regularization
technique, it appears superior to adversarial training as it remains stable
over a broader range of regimes and reduces all generalization errors. Equipped
with these abstractions, we provide key baseline results and methodology for
characterizing robustness. The two approaches can be combined to yield one
small model that demonstrates good robustness to several white-box attacks
associated with different metrics.