Adversarial Diversity and Hard Positive Generation
Andras Rozsa
and
Ethan M. Rudd
and
Terrance E. Boult
arXiv e-Print archive - 2016 via Local arXiv
Keywords:
cs.CV
First published: 2016/05/05 (8 years ago) Abstract: State-of-the-art deep neural networks suffer from a fundamental problem -
they misclassify adversarial examples formed by applying small perturbations to
inputs. In this paper, we present a new psychometric perceptual adversarial
similarity score (PASS) measure for quantifying adversarial images, introduce
the notion of hard positive generation, and use a diverse set of adversarial
perturbations - not just the closest ones - for data augmentation. We introduce
a novel hot/cold approach for adversarial example generation, which provides
multiple possible adversarial perturbations for every single image. The
perturbations generated by our novel approach often correspond to semantically
meaningful image structures, and allow greater flexibility to scale
perturbation-amplitudes, which yields an increased diversity of adversarial
images. We present adversarial images on several network topologies and
datasets, including LeNet on the MNIST dataset, and GoogLeNet and ResidualNet
on the ImageNet dataset. Finally, we demonstrate on LeNet and GoogLeNet that
fine-tuning with a diverse set of hard positives improves the robustness of
these networks compared to training with prior methods of generating
adversarial images.
Rozsa et al. propose PASS, an perceptual similarity metric invariant to homographies to quantify adversarial perturbations. In particular, PASS is based on the structural similarity metric SSIM [1]; specifically
$PASS(\tilde{x}, x) = SSIM(\psi(\tilde{x},x), x)$
where $\psi(\tilde{x}, x)$ transforms the perturbed image $\tilde{x}$ to the image $x$ by applying a homography $H$ (which can be found through optimization). Based on this similarity metric, they consider additional attacks which create small perturbations in terms of the PASS score, but result in larger $L_p$ norms; see the paper for experimental results.
[1] Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli. Image quality assessment: from error visibility to structural similarity. TIP, 2004.
Also see this summary at [davidstutz.de](https://davidstutz.de/category/reading/).