The Limitations of Deep Learning in Adversarial Settings
Nicolas Papernot
and
Patrick McDaniel
and
Somesh Jha
and
Matt Fredrikson
and
Z. Berkay Celik
and
Ananthram Swami
arXiv e-Print archive - 2015 via Local arXiv
Keywords:
cs.CR, cs.LG, cs.NE, stat.ML
First published: 2015/11/24 (8 years ago) Abstract: Deep learning takes advantage of large datasets and computationally efficient
training algorithms to outperform other approaches at various machine learning
tasks. However, imperfections in the training phase of deep neural networks
make them vulnerable to adversarial samples: inputs crafted by adversaries with
the intent of causing deep neural networks to misclassify. In this work, we
formalize the space of adversaries against deep neural networks (DNNs) and
introduce a novel class of algorithms to craft adversarial samples based on a
precise understanding of the mapping between inputs and outputs of DNNs. In an
application to computer vision, we show that our algorithms can reliably
produce samples correctly classified by human subjects but misclassified in
specific targets by a DNN with a 97% adversarial success rate while only
modifying on average 4.02% of the input features per sample. We then evaluate
the vulnerability of different sample classes to adversarial perturbations by
defining a hardness measure. Finally, we describe preliminary work outlining
defenses against adversarial samples by defining a predictive measure of
distance between a benign input and a target classification.