Adversarial Spheres
Justin Gilmer
and
Luke Metz
and
Fartash Faghri
and
Samuel S. Schoenholz
and
Maithra Raghu
and
Martin Wattenberg
and
Ian Goodfellow
arXiv e-Print archive - 2018 via Local arXiv
Keywords:
cs.CV, 68T45, I.2.6
First published: 2018/01/09 (6 years ago) Abstract: State of the art computer vision models have been shown to be vulnerable to
small adversarial perturbations of the input. In other words, most images in
the data distribution are both correctly classified by the model and are very
close to a visually similar misclassified image. Despite substantial research
interest, the cause of the phenomenon is still poorly understood and remains
unsolved. We hypothesize that this counter intuitive behavior is a naturally
occurring result of the high dimensional geometry of the data manifold. As a
first step towards exploring this hypothesis, we study a simple synthetic
dataset of classifying between two concentric high dimensional spheres. For
this dataset we show a fundamental tradeoff between the amount of test error
and the average distance to nearest error. In particular, we prove that any
model which misclassifies a small constant fraction of a sphere will be
vulnerable to adversarial perturbations of size $O(1/\sqrt{d})$. Surprisingly,
when we train several different architectures on this dataset, all of their
error sets naturally approach this theoretical bound. As a result of the
theory, the vulnerability of neural networks to small adversarial perturbations
is a logical consequence of the amount of test error observed. We hope that our
theoretical analysis of this very simple case will point the way forward to
explore how the geometry of complex real-world data sets leads to adversarial
examples.
Gilmer et al. study the existence of adversarial examples on a synthetic toy datasets consisting of two concentric spheres. The dataset is created by randomly sampling examples from two concentric spheres, one with radius $1$ and one with radius $R = 1.3$. While the authors argue that difference difficulties of the dataset can be created by varying $R$ and the dimensionality, they merely experiment with $R = 1.3$ and a dimensionality of $500$. The motivation to study this dataset comes form the idea that adversarial examples can easily be found by leaving the data manifold. Based on this simple dataset, the authors provide several theoretical insights – see the paper for details.
Beneath theoretical insights, Gilmer et al. slso discuss the so-called manifold attack, an attack using projected gradient descent which ensures that the adversarial examples stays on the data-manifold – moreover, it is ensured that the class does not change. Unfortunately (as I can tell), this idea of a manifold attack is not studied further – which is very unfortunate and renders the question while this concept was introduced in the first place.
One of the main take-aways is the suggestion that there is a trade-off between accuracy (i.e. the ability of the network to perform well) and the average distance to an adversarial example. Thus, the existence of adversarial examples might be related to the question why deep neural networks perform very well.
Also see this summary at [davidstutz.de](https://davidstutz.de/category/reading/).