First published: 2017/10/29 (5 years ago) Abstract: Neural networks are vulnerable to adversarial examples and researchers have
proposed many heuristic attack and defense mechanisms. We address this problem
through the principled lens of distributionally robust optimization, which
guarantees performance under adversarial input perturbations. By considering a
Lagrangian penalty formulation of perturbing the underlying data distribution
in a Wasserstein ball, we provide a training procedure that augments model
parameter updates with worst-case perturbations of training data. For smooth
losses, our procedure provably achieves moderate levels of robustness with
little computational or statistical cost relative to empirical risk
minimization. Furthermore, our statistical guarantees allow us to efficiently
certify robustness for the population loss. For imperceptible perturbations,
our method matches or outperforms heuristic approaches.
A novel method for adversarially-robust learning with theoretical guarantees under small perturbations.
1) Given the default distribution P_0, defines a proximity of it as a set of distributions which are \rho-close to P_0 in terms of Wasserstein metric with a predefined cost function c (e.g. L2);
2) Formulates the robust learning problem as minimization of the worst-case example in the proximity and proposes a Lagrangian relaxation of it;
3) Given it, provides a data-dependent upper bound on the worst-case loss, demonstrates that the problem of finding the worst-case adversarial perturbation, which is generally NP hard, renders to optimization of a concave function if the maximum amount of perturbation \rho is low.