On the Intriguing Connections of Regularization, Input Gradients and Transferability of Evasion and Poisoning Attacks
Ambra Demontis
and
Marco Melis
and
Maura Pintor
and
Matthew Jagielski
and
Battista Biggio
and
Alina Oprea
and
Cristina Nita-Rotaru
and
Fabio Roli
arXiv e-Print archive - 2018 via Local arXiv
Keywords:
cs.LG, cs.CR, stat.ML, 68T10, 68T45
First published: 2018/09/08 (6 years ago) Abstract: Transferability captures the ability of an attack against a machine-learning
model to be effective against a different, potentially unknown, model. Studying
transferability of attacks has gained interest in the last years due to the
deployment of cyber-attack detection services based on machine learning. For
these applications of machine learning, service providers avoid disclosing
information about their machine-learning algorithms. As a result, attackers
trying to bypass detection are forced to craft their attacks against a
surrogate model instead of the actual target model used by the service. While
previous work has shown that finding test-time transferable attack samples is
possible, it is not well understood how an attacker may construct adversarial
examples that are likely to transfer against different models, in particular in
the case of training-time poisoning attacks. In this paper, we present the
first empirical analysis aimed to investigate the transferability of both
test-time evasion and training-time poisoning attacks. We provide a unifying,
formal definition of transferability of such attacks and show how it relates to
the input gradients of the surrogate and of the target classification models.
We assess to which extent some of the most well-known machine-learning systems
are vulnerable to transfer attacks, and explain why such attacks succeed (or
not) across different models. To this end, we leverage some interesting
connections highlighted in this work among the adversarial vulnerability of
machine-learning models, their regularization hyperparameters and input
gradients.