Defensive Distillation is Not Robust to Adversarial Examples
Carlini, Nicholas
and
Wagner, David A.
arXiv e-Print archive - 2016 via Local Bibsonomy
Keywords:
dblp
Carlini and Wagner show that defensive distillation as defense against adversarial examples does not work. Specifically, they show that the attack by Papernot et al [1] can easily be modified to attack distilled networks. Interestingly, the main change is to introduce a temperature in the last softmax layer. This termperature, when chosen hgih enough will take care of aligning the gradients from the softmax layer and from the logit layer – otherwise, they will have significantly different magnitude. Personally, I found that this also aligns with the observations in [2] where Carlini and Wagner also find that attack objectives defined on the logits work considerably better.
[1] N. Papernot, P. McDaniel, X. Wu, S. Jha, A. Swami. Distillation as a defense to adersarial perturbations against deep neural networks. SP, 2016.
[2] N. Carlini, D. Wagner. Towards Evaluating the Robustness of Neural Networks. ArXiv, 2016.
Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/).