Adversarial Defense based on Structure-to-Signal Autoencoders
Folz, Joachim
and
Palacio, Sebastian
and
Hees, Jörn
and
Borth, Damian
and
Dengel, Andreas
arXiv e-Print archive - 2018 via Local Bibsonomy
Keywords:
dblp
Folz et al. propose an auto-encoder based defense against adversarial examples. In particular, they propose structure-to-signal auto-encoders, S2SNets, as defense mechanism – this auto-encoder is first trained in an unsupervised fashion to reconstruct images (which can be done independent of attack models or the classification network under attack). Then, the network’s decoder is fine tuned using gradients from the classification network. Their main argumentation is that the gradients of the composite network – auto-encoder plus classification network – are not class specific anymore as only the decoder is fine-tuned but not the encoder (as the encoder is trained to encode any image independent of the classification task). Experimentally they show that the gradients are indeed less class-specific. Additionally, the authors highlight that this defense is independent of an attack model and can be applied to any pre-trained classification model. Unforutntely, the approach is not compared against other defense machenisms – while related work was mentioned, a comparison would have been useful.
Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/).