Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks
Nicolas Papernot
and
Patrick McDaniel
and
Xi Wu
and
Somesh Jha
and
Ananthram Swami
arXiv e-Print archive - 2015 via Local arXiv
Keywords:
cs.CR, cs.LG, cs.NE, stat.ML
First published: 2015/11/14 (9 years ago) Abstract: Deep learning algorithms have been shown to perform extremely well on many
classical machine learning problems. However, recent studies have shown that
deep learning, like other machine learning techniques, is vulnerable to
adversarial samples: inputs crafted to force a deep neural network (DNN) to
provide adversary-selected outputs. Such attacks can seriously undermine the
security of the system supported by the DNN, sometimes with devastating
consequences. For example, autonomous vehicles can be crashed, illicit or
illegal content can bypass content filters, or biometric authentication systems
can be manipulated to allow improper access. In this work, we introduce a
defensive mechanism called defensive distillation to reduce the effectiveness
of adversarial samples on DNNs. We analytically investigate the
generalizability and robustness properties granted by the use of defensive
distillation when training DNNs. We also empirically study the effectiveness of
our defense mechanisms on two DNNs placed in adversarial settings. The study
shows that defensive distillation can reduce effectiveness of sample creation
from 95% to less than 0.5% on a studied DNN. Such dramatic gains can be
explained by the fact that distillation leads gradients used in adversarial
sample creation to be reduced by a factor of 10^30. We also find that
distillation increases the average minimum number of features that need to be
modified to create adversarial samples by about 800% on one of the DNNs we
tested.