Robustness of classifiers: from adversarial to random noise
Alhussein Fawzi
and
Seyed-Mohsen Moosavi-Dezfooli
and
Pascal Frossard
arXiv e-Print archive - 2016 via Local arXiv
Keywords:
cs.LG, cs.CV, stat.ML
First published: 2016/08/31 (8 years ago) Abstract: Several recent works have shown that state-of-the-art classifiers are
vulnerable to worst-case (i.e., adversarial) perturbations of the datapoints.
On the other hand, it has been empirically observed that these same classifiers
are relatively robust to random noise. In this paper, we propose to study a
\textit{semi-random} noise regime that generalizes both the random and
worst-case noise regimes. We propose the first quantitative analysis of the
robustness of nonlinear classifiers in this general noise regime. We establish
precise theoretical bounds on the robustness of classifiers in this general
regime, which depend on the curvature of the classifier's decision boundary.
Our bounds confirm and quantify the empirical observations that classifiers
satisfying curvature constraints are robust to random noise. Moreover, we
quantify the robustness of classifiers in terms of the subspace dimension in
the semi-random noise regime, and show that our bounds remarkably interpolate
between the worst-case and random noise regimes. We perform experiments and
show that the derived bounds provide very accurate estimates when applied to
various state-of-the-art deep neural networks and datasets. This result
suggests bounds on the curvature of the classifiers' decision boundaries that
we support experimentally, and more generally offers important insights onto
the geometry of high dimensional classification problems.
Fawzi et al. study robustness in the transition from random samples to semi-random and adversarial samples. Specifically they present bounds relating the norm of an adversarial perturbation to the norm of random perturbations – for the exact form I refer to the paper. Personally, I find the definition of semi-random noise most interesting, as it allows to get an intuition for distinguishing random noise from adversarial examples. As in related literature, adversarial examples are defined as
$r_S(x_0) = \arg\min_{x_0 \in S} \|r\|_2$ s.t. $f(x_0 + r) \neq f(x_0)$
where $f$ is the classifier to attack and $S$ the set of allowed perturbations (e.g. requiring that the perturbed samples are still images). If $S$ is mostly unconstrained regarding the direction of $r$ in high dimensional space, Fawzi et al. consider $r$ to be an adversarial examples – intuitively, and adversary can choose $r$ arbitrarily to fool the classifier. If, however, the directions considered in $S$ are constrained to an $m$-dimensional subspace, Fawzi et al. consider $r$ to be semi-random noise. In the extreme case, if $m = 1$, $r$ is random noise. In this case, we can intuitively think of $S$ as a randomly chosen one dimensional subspace – i.e. a random direction in multi-dimensional space.
Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/).