First published: 2018/09/25 (6 years ago) Abstract: In adversarial attacks to machine-learning classifiers, small perturbations
are added to input that is correctly classified. The perturbations yield
adversarial examples, which are virtually indistinguishable from the
unperturbed input, and yet are misclassified. In standard neural networks used
for deep learning, attackers can craft adversarial examples from most input to
cause a misclassification of their choice.
We introduce a new type of network units, called RBFI units, whose non-linear
structure makes them inherently resistant to adversarial attacks. On
permutation-invariant MNIST, in absence of adversarial attacks, networks using
RBFI units match the performance of networks using sigmoid units, and are
slightly below the accuracy of networks with ReLU units. When subjected to
adversarial attacks, networks with RBFI units retain accuracies above 90% for
attacks that degrade the accuracy of networks with ReLU or sigmoid units to
below 2%. RBFI networks trained with regular input are superior in their
resistance to adversarial attacks even to ReLU and sigmoid networks trained
with the help of adversarial examples.
The non-linear structure of RBFI units makes them difficult to train using
standard gradient descent. We show that networks of RBFI units can be
efficiently trained to high accuracies using pseudogradients, computed using
functions especially crafted to facilitate learning instead of their true
derivatives. We show that the use of pseudogradients makes training deep RBFI
networks practical, and we compare several structural alternatives of RBFI
networks for their accuracy.