First published: 2018/09/25 (6 years ago) Abstract: In adversarial attacks to machine-learning classifiers, small perturbations
are added to input that is correctly classified. The perturbations yield
adversarial examples, which are virtually indistinguishable from the
unperturbed input, and yet are misclassified. In standard neural networks used
for deep learning, attackers can craft adversarial examples from most input to
cause a misclassification of their choice.
We introduce a new type of network units, called RBFI units, whose non-linear
structure makes them inherently resistant to adversarial attacks. On
permutation-invariant MNIST, in absence of adversarial attacks, networks using
RBFI units match the performance of networks using sigmoid units, and are
slightly below the accuracy of networks with ReLU units. When subjected to
adversarial attacks, networks with RBFI units retain accuracies above 90% for
attacks that degrade the accuracy of networks with ReLU or sigmoid units to
below 2%. RBFI networks trained with regular input are superior in their
resistance to adversarial attacks even to ReLU and sigmoid networks trained
with the help of adversarial examples.
The non-linear structure of RBFI units makes them difficult to train using
standard gradient descent. We show that networks of RBFI units can be
efficiently trained to high accuracies using pseudogradients, computed using
functions especially crafted to facilitate learning instead of their true
derivatives. We show that the use of pseudogradients makes training deep RBFI
networks practical, and we compare several structural alternatives of RBFI
networks for their accuracy.
De Alfaro proposes a deep radial basis function (RBF) network to obtain robustness against adversarial examples. In contrast to “regular” RBF networks, which usually consist of only one hidden layer containing RBF units, de Alfaro proposes to stack multiple layers with RBF units. Specifically, a Gaussian unit utilizing the $L_\infty$ norm is used:
$\exp\left( - \max_i(u_i(x_i – w_i))^2\right)$
where $u_i$ and $w_i$ are parameters and $x_i$ are the inputs to the unit – so the network inputs or the outputs of the previous hidden layer. This unit can be understood as computing a soft AND operation; therefore, an alternative OR operation
$1 - \exp\left( - \max_i(u_i(x_i – w_i))^2\right)$
is used as well. These two units are used alternatingly in hidden layers in the conducted experiments. Based on these units, de Alfaro argues that the model is less sensitive to adversarial examples, compared to linear operations as commonly used in ReLU networks.
For training a deep RBF-network, pseudo gradients are used for both the maximum operation and the exponential function. This is done for simplifying training; I refer to the paper for details.
In their experiments, on MNIST, a multi-layer perceptron with the proposed RBF units is used. The network consists of 512 AND units, 512 OR units, 512 AND units and finally 10 OR units. Robustness against FGSM and I-FGSM as well as PGD attacks seems to improve. However, the used PGD attack seems to be weaker than usually, it does not manage to reduce adversarial accuracy of a normal networks to near-zero.
Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/).