Delving into Transferable Adversarial Examples and Black-box Attacks
Yanpei Liu
and
Xinyun Chen
and
Chang Liu
and
Dawn Song
arXiv e-Print archive - 2016 via Local arXiv
Keywords:
cs.LG
First published: 2016/11/08 (8 years ago) Abstract: An intriguing property of deep neural networks is the existence of
adversarial examples, which can transfer among different architectures. These
transferable adversarial examples may severely hinder deep neural network-based
applications. Previous works mostly study the transferability using small scale
datasets. In this work, we are the first to conduct an extensive study of the
transferability over large models and a large scale dataset, and we are also
the first to study the transferability of targeted adversarial examples with
their target labels. We study both non-targeted and targeted adversarial
examples, and show that while transferable non-targeted adversarial examples
are easy to find, targeted adversarial examples generated using existing
approaches almost never transfer with their target labels. Therefore, we
propose novel ensemble-based approaches to generating transferable adversarial
examples. Using such approaches, we observe a large proportion of targeted
adversarial examples that are able to transfer with their target labels for the
first time. We also present some geometric studies to help understanding the
transferable adversarial examples. Finally, we show that the adversarial
examples generated using ensemble-based approaches can successfully attack
Clarifai.com, which is a black-box image classification system.