First published: 2017/10/29 (4 years ago) Abstract: Neural networks are vulnerable to adversarial examples and researchers have
proposed many heuristic attack and defense mechanisms. We address this problem
through the principled lens of distributionally robust optimization, which
guarantees performance under adversarial input perturbations. By considering a
Lagrangian penalty formulation of perturbing the underlying data distribution
in a Wasserstein ball, we provide a training procedure that augments model
parameter updates with worst-case perturbations of training data. For smooth
losses, our procedure provably achieves moderate levels of robustness with
little computational or statistical cost relative to empirical risk
minimization. Furthermore, our statistical guarantees allow us to efficiently
certify robustness for the population loss. For imperceptible perturbations,
our method matches or outperforms heuristic approaches.
Sinha et al. introduce a variant of adversarial training based on distributional robust optimization. I strongly recommend reading the paper for understanding the introduced theoretical framework. The authors also provide guarantees on the obtained adversarial loss – and show experimentally that this guarantee is a realistic indicator. The adversarial training variant itself follows the general strategy of training on adversarially perturbed training samples in a min-max framework. In each iteration, an attacker crafts an adversarial examples which the network is trained on. In a nutshell, their approach differs from previous ones (apart from the theoretical framework) in the used attacker. Specifically, their attacker optimizes
$\arg\max_z l(\theta, z) - \gamma \|z – z^t\|_p^2$
where $z^t$ is a training sample chosen randomly during training.
On a side note, I also recommend reading the reviews of this paper: https://openreview.net/forum?id=Hk6kPgZA-
Also view this summary at [davidstutz.de](https://davidstutz.de/category/reading/).