Certifying Some Distributional Robustness with Principled Adversarial Training
Aman Sinha
and
Hongseok Namkoong
and
John Duchi
arXiv e-Print archive - 2017 via Local arXiv
Keywords:
stat.ML, cs.LG
First published: 2017/10/29 (7 years ago) Abstract: Neural networks are vulnerable to adversarial examples and researchers have
proposed many heuristic attack and defense mechanisms. We address this problem
through the principled lens of distributionally robust optimization, which
guarantees performance under adversarial input perturbations. By considering a
Lagrangian penalty formulation of perturbing the underlying data distribution
in a Wasserstein ball, we provide a training procedure that augments model
parameter updates with worst-case perturbations of training data. For smooth
losses, our procedure provably achieves moderate levels of robustness with
little computational or statistical cost relative to empirical risk
minimization. Furthermore, our statistical guarantees allow us to efficiently
certify robustness for the population loss. For imperceptible perturbations,
our method matches or outperforms heuristic approaches.