A Theoretical Framework for Robustness of (Deep) Classifiers against Adversarial Examples on ShortScience.org

arxiv.org
arxiv-vanity.com
scholar.google.com

A Theoretical Framework for Robustness of (Deep) Classifiers against Adversarial Examples
Beilun Wang and Ji Gao and Yanjun Qi
arXiv e-Print archive - 2016 via Local arXiv
Keywords: cs.LG, cs.CR, cs.CV
more

Summaries/Notes 1

[link] Summary by David Stutz 6 years ago

Wang et al. discuss an alternative definition of adversarial examples, taking into account an oracle classifier. Adversarial perturbations are usually constrained in their norm (e.g., $L_\infty$ norm for images); however, the main goal of this constraint is to ensure label invariance – if the image didn’t change notable, the label didn’t change either. As alternative formulation, the authors consider an oracle for the task, e.g., humans for image classification tasks. Then, an adversarial example is defined as a slightly perturbed input, whose predicted label changes, but where the true label (i.e., the oracle’s label) does not change. Additionally, the perturbation can be constrained in some norm; specifically, the perturbation can be constrained on the true manifold of the data, as represented by the oracle classifier. Based on this notion of adversarial examples, Wang et al. argue that deep neural networks are not robust as they utilize over-complete feature representations.

Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/).

Your comment: