Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples
Athalye, Anish
and
Carlini, Nicholas
and
Wagner, David A.
arXiv e-Print archive - 2018 via Local Bibsonomy
Keywords:
dblp
Athalye et al. propose methods to circumvent different types of defenses against adversarial example based on obfuscated gradients. In particular, they identify three types of obfuscated gradients: shattered gradients (e.g., caused by undifferentiable parts of a network or through numerical instability), stochastic gradients, and exploding and vanishing gradients. These phenomena all influence the effectiveness of gradient-based attacks. Athalye et al. Give several indicators of how to find out when obfuscated gradients occur. Personally, I find most of these points straight forward, but it is still beneficial to write these “debug strategies” down. The main contribution, however, is a comprehensive evaluation of all eight ICLR’18 defenses against state-of-the-art attacks. As all (except adversarial training) cause obfuscated gradients, Athalye et al. Discuss several strategies to “un-obfuscate” the gradients to successfully compute adversarial examples. Overall, they show that seven out of eight defenses are not reliable, only adversarial training with projected gradient descent can withstand attacks limited to $\epsilon\approx 0.3$.
Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/).