First published: 2017/04/05 (4 years ago) Abstract: A recent paper suggests that Deep Neural Networks can be protected from
gradient-based adversarial perturbations by driving the network activations
into a highly saturated regime. Here we analyse such saturated networks and
show that the attacks fail due to numerical limitations in the gradient
computations. A simple stabilisation of the gradient estimates enables
successful and efficient attacks. Thus, it has yet to be shown that the
robustness observed in highly saturated networks is not simply due to numerical
limitations.
Brendel et al. propose a decision-based black-box attacks against (deep convolutional) neural networks. Specifically, the so-called Boundary Attack starts with a random adversarial example (i.e. random noise that is not classified as the image to be attacked) and randomly perturbs this initialization to move closer to the target image while remaining misclassified. In pseudo code, the algorithm is described in Algorithm 1. Key component is the proposal distribution $P$ used to guide the adversarial perturbation in each step. In practice, they use a maximum-entropy distribution (e.g. uniform) with a couple of constraints: the perturbed sample is a valid image; the perturbation has a specified relative size, i.e. $\|\eta^k\|_2 = \delta d(o, \tilde{o}^{k-1})$; and the perturbation reduces the distance to the target image $o$: $d(o, \tilde{o}^{k-1}) – d(o,\tilde{o}^{k-1} + \eta^k)=\epsilon d(o, \tilde{o}^{k-1})$. This is approximated by sampling from a standard Gaussian, clipping and rescaling and projecting the perturbation onto the $\epsilon$-sphere around the image. In experiments, they show that this attack is competitive to white-box attacks and can attack real-world systems.
https://i.imgur.com/BmzhiFP.png
Algorithm 1: Minimal pseudo code version of the boundary attack.
Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/).