First published: 2018/06/28 (6 years ago) Abstract: Deep neural networks are susceptible to adversarial attacks. In computer
vision, well-crafted perturbations to images can cause neural networks to make
mistakes such as identifying a panda as a gibbon or confusing a cat with a
computer. Previous adversarial examples have been designed to degrade
performance of models or cause machine learning models to produce specific
outputs chosen ahead of time by the attacker. We introduce adversarial attacks
that instead reprogram the target model to perform a task chosen by the
attacker---without the attacker needing to specify or compute the desired
output for each test-time input. This attack is accomplished by optimizing for
a single adversarial perturbation, of unrestricted magnitude, that can be added
to all test-time inputs to a machine learning model in order to cause the model
to perform a task chosen by the adversary when processing these inputs---even
if the model was not trained to do this task. These perturbations can be thus
considered a program for the new task. We demonstrate adversarial reprogramming
on six ImageNet classification models, repurposing these models to perform a
counting task, as well as two classification tasks: classification of MNIST and
CIFAR-10 examples presented within the input to the ImageNet model.