Towards Evaluating the Robustness of Neural Networks on ShortScience.org

arxiv.org
scholar.google.com

Towards Evaluating the Robustness of Neural Networks
Nicholas Carlini and David Wagner
arXiv e-Print archive - 2016 via Local arXiv
Keywords: cs.CR, cs.CV
more

Summaries/Notes 1

[link] Summary by David Stutz 6 years ago

Carlini and Wagner propose three novel methods/attacks for adversarial examples and show that defensive distillation is not effective. In particular, they devise attacks for all three commonly used norms $L_1$, $L_2$ and $L_\infty$ – which are used to measure the deviation of the adversarial perturbation from the original testing sample. In the course of the paper, starting with the targeted objective
$\min_\delta d(x, x + \delta)$ s.t. $f(x + \delta) = t$ and $x+\delta \in [0,1]^n$,
they consider up to 7 different surrogate objectives to express the constraint $f(x + \delta) = t$. Here, $f$ is the neural network to attack and $\delta$ denotes the perturbation. This leads to the formulation

$\min_\delta \|\delta\|_p + cL(x + \delta)$ s.t. $x + \delta \in [0,1]^n$

where $L$ is the surrogate loss. After extensive evaluation, the loss $L$ is taken to be

$L(x') = \max(\max\{Z(x')_i : i\neq t\} - Z(x')_t, -\kappa)$

where $x' = x + \delta$ and $Z(x')_i$ refers to the logit for class $i$; $\kappa$ is a constant ($=0$ in their experiments) that can be used to control the confidence of the adversarial example. In practice, the box constraint $[0,1]^n$ is encoded through a change of variable by expressing $\delta$ in terms of the hyperbolic tangent, see the paper for details. Carlini and Wagner then discuss the detailed attacks for all three norms, i.e. $L_1$, $L_2$ and $L_\infty$ where the first and latter are discussed in more detail as they impose non-differentiability.

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private