First published: 2016/12/19 (7 years ago) Abstract: Deep neural networks are powerful and popular learning models that achieve
state-of-the-art pattern recognition performance on many computer vision,
speech, and language processing tasks. However, these networks have also been
shown susceptible to carefully crafted adversarial perturbations which force
misclassification of the inputs. Adversarial examples enable adversaries to
subvert the expected system behavior leading to undesired consequences and
could pose a security risk when these systems are deployed in the real world.
In this work, we focus on deep convolutional neural networks and demonstrate
that adversaries can easily craft adversarial examples even without any
internal knowledge of the target network. Our attacks treat the network as an
oracle (black-box) and only assume that the output of the network can be
observed on the probed inputs. Our first attack is based on a simple idea of
adding perturbation to a randomly selected single pixel or a small set of them.
We then improve the effectiveness of this attack by carefully constructing a
small set of pixels to perturb by using the idea of greedy local-search. Our
proposed attacks also naturally extend to a stronger notion of
misclassification. Our extensive experimental results illustrate that even
these elementary attacks can reveal a deep neural network's vulnerabilities.
The simplicity and effectiveness of our proposed schemes mean that they could
serve as a litmus test for designing robust networks.