First published: 2018/02/09 (4 years ago) Abstract: Adversarial examples that fool machine learning models, particularly deep
neural networks, have been a topic of intense research interest, with attacks
and defenses being developed in a tight back-and-forth. Most past defenses are
best effort and have been shown to be vulnerable to sophisticated attacks.
Recently a set of certified defenses have been introduced, which provide
guarantees of robustness to norm-bounded attacks, but they either do not scale
to large datasets or are limited in the types of models they can support. This
paper presents the first certified defense that both scales to large networks
and datasets (such as Google's Inception network for ImageNet) and applies
broadly to arbitrary model types. Our defense, called PixelDP, is based on a
novel connection between robustness against adversarial examples and
differential privacy, a cryptographically-inspired formalism, that provides a
rigorous, generic, and flexible foundation for defense.
Lecuyer et al. propose a defense against adversarial examples based on differential privacy. Their main insight is that a differential private algorithm is also robust to slight perturbations. In practice, this amounts to injecting noise in some layer (or on the image directly) and using Monte Carlo estimation for computing the expected prediction. The approach is compared to adversarial training against the Carlini+Wagner attack.
Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/).