First published: 2017/08/05 (5 years ago) Abstract: Deep neural networks (DNNs) provide state-of-the-art results on various tasks
and are widely used in real world applications. However, it was discovered that
machine learning models, including the best performing DNNs, suffer from a
fundamental problem: they can unexpectedly and confidently misclassify examples
formed by slightly perturbing otherwise correctly recognized inputs. Various
approaches have been developed for efficiently generating these so-called
adversarial examples, but those mostly rely on ascending the gradient of loss.
In this paper, we introduce the novel logits optimized targeting system (LOTS)
to directly manipulate deep features captured at the penultimate layer. Using
LOTS, we analyze and compare the adversarial robustness of DNNs using the
traditional Softmax layer with Openmax, which was designed to provide open set
recognition by defining classes derived from deep representations, and is
claimed to be more robust to adversarial perturbations. We demonstrate that
Openmax provides less vulnerable systems than Softmax to traditional attacks,
however, we show that it can be equally susceptible to more sophisticated
adversarial generation techniques that directly work on deep representations.