Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks on ShortScience.org

arxiv.org
scholar.google.com

Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks
Nicolas Papernot and Patrick McDaniel and Xi Wu and Somesh Jha and Ananthram Swami
arXiv e-Print archive - 2015 via Local arXiv
Keywords: cs.CR, cs.LG, cs.NE, stat.ML
more

Summaries/Notes 1

[link] Summary by David Stutz 5 years ago

Papernot et al. build upon the idea of network distillation [1] and propose a simple mechanism to defend networks against adversarial attacks. The main idea of distillation – originally introduced to “distill” the knowledge of very deep networks into smaller ones – is to train a second, possibly smaller network, with the probability distributions of the original, possibly larger network as supervision. Papernot et al. as well as the authors of [1] argue that the probability distributions, i.e. the activations of the final softmax layer (also referred to as “soft” labels), contain rich information about the task in contrast to the true “hard” labels. This allows the network to achieve similar performance while using less parameters or a different architecture.

However, Papernot et al. do not distill a network's knowledge into a smaller one; instead they use distillation to make networks robust against adversarial attacks. They argue that most algorithms to generate adversarial examples make use of the “adversarial gradient”; i.e. the gradient of the network's cost w.r.t. its input. The adversarial gradient then guides perturbation of the input image in the direction of wrong classes (the authors consider a simple classification task for simplicity). Therefore, Papernot et al. Argure, the gradient around training samples needs to be reduced – in other words, the model needs to be smoothed.

https://i.imgur.com/jXIhIGz.png

The proposed approach is very simple, they just distill the knowledge of the network into another network with same architectures and hyper parameters. By using the probability distributions as “soft” labels instead of the hard labels for training, the network is essentially smoothed. The full procedure is illustrated in Figure 1.

Despite the simplicity of the approach, I want to highlight some additional key observations:
- Distillation is also supposed to help generalization by avoiding overly confident networks.
- The success rate of adversarial attacks can be reduced significantly as shown in quantitative experiments.
- The amplitude of adversarial gradients can be reduced, which means that the network has been smoothed and is less sensitive to variations in the input samples.

Also see this summary on [davidstutz.de](https://davidstutz.de/category/reading/).

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private