Intriguing properties of neural networks on ShortScience.org

arxiv.org
arxiv-vanity.com
scholar.google.com

Intriguing properties of neural networks
Christian Szegedy and Wojciech Zaremba and Ilya Sutskever and Joan Bruna and Dumitru Erhan and Ian Goodfellow and Rob Fergus
arXiv e-Print archive - 2013 via Local arXiv
Keywords: cs.CV, cs.LG, cs.NE
more

Summaries/Notes 3

[link] Summary by ANIRUDH NJ 6 years ago

### Keywords
Adversarial example , Perturbations

------

### Summary

##### Introduction

* Explain two properties of neural network that cause it to misclassify images and cause difficulty to get solid understanding of network.
1. Theoretical understanding of the individual high level unit of a network and a combination of these units or layers.
2. Understanding the continuity of input - output mapping space and the stability of the output wrt. the input.

* Performing a few experiments on different networks and architectures
1. MNIST dataset - Autoencoder , Fully Connected net
2. ImageNet - “AlexNet”
3. 10M youtube images - “QuocNet”

##### Understanding individual units of the Network

* Previous work used individual images to maximize the activation value of each feature unit.
Similar experiment was done by the authors on the MNIST data set.

* The interpretation of the results are as following ;
1. Random direction vector (V) gives rise to similarly interpretable semantic properties.
2. Each feature unit is able to generate invariance on a particular subset of input distribution.
https://i.imgur.com/SeyXJeV.png

##### Blind spots in the neural network

* Output layers are highly non-linear and are able to give a nonlinear generalization over the input space.
* It is possible for the output layers to give non-significant probabilities to regions of the input space that contain no training examples in their vicinity. Ie. It is possible to obtain probability of the different viewpoints of the object without training.
* Deep learning kernel methods can't be assumed to have smooth decision boundaries.
* Using optimization techniques, small changes to the image can lead to very large deviations in the output
* __“Adversarial examples”__ represent pockets or holes in the input-space which are difficult to find simply moving around the input images.

##### Experimental Results
* Adversarial examples that are indistinguishable from the actual image can be created for all networks.
1. Cross model generalization : Adversarial images created for one network can affect the other networks also.
2. Cross training generalization

https://i.imgur.com/drcGvpz.png

##### Conclusion
* Neural network have a counter intuitive properties wrt. the working of the individual units and discontinuities.
* Occurance of the adversarial examples and its properties.

-----
### Notes

* Feeding adversarial examples during the model training can improve the generalization of the model.
* The adversarial examples on the higher layers are more effective than those of input and lower layers.
* Adversarial examples affect models trained with different hyper parameters.
* According to the the test conducted , autoencoders are more resilient to the adversarial examples.
* Deep learning networks which are trained from purely supervised training are unstable to a few particular types of perturbations. Small addition of perturbations to the input leads to large perturbations at the output of the last layers.

### Open research questions

[1] Comparing the effects of adversarial examples on lower layers to that of the higher layers.
[2] Dependence of the adversarial attacks on training data set of the model.
[3] Why the adversarial examples generalize across different hyperparameters or training sets.
[4] How often do adversarial example occur?

Your comment:

[link] Summary by Abhishek Das 7 years ago

The paper introduces two key properties of deep neural networks:

- Semantic meaning of individual units.
- Earlier works analyzed learnt semantics by finding images that maximally activate individual units.
- Authors observe that there is no difference between individual units and random linear combinations of units.
- It is the entire space of activations that contains the bulk of semantic information.

- Stability of neural networks to small perturbations in input space.
- Networks that generalize well are expected to be robust to small perturbations in the input, i.e. imperceptible noise in the input shouldn't change the predicted class.
- Authors find that networks can be made to misclassify an image by applying a certain imperceptible perturbation, which is found by maximizing the network's prediction error.
- These 'adversarial examples' generalize well to different architectures trained on different data subsets.

## Strengths

- The authors propose a way to make networks more robust to small perturbations by training them with adversarial examples in an adaptive manner, i.e. keep changing the pool of adversarial examples during training. In this regard, they draw a connection with hard-negative mining, and a network trained with adversarial examples performs better than others.

- Formal description of how to generate adversarial examples and mathematical analysis of a network's stability to perturbations are useful studies.

## Weaknesses / Notes

- Two images that are visually indistinguishable to humans but classified differently by the network is indeed an intriguing observation.

- The paper feels a little half-baked in parts, and some ideas could've been presented more clearly.

Your comment:

[link] Summary by David Stutz 7 years ago

Szegedy et al. were (to the best of my knowledge) the first to describe the phenomen of adversarial examples as researched today. Specifically, they described the main objective in order to obtain adversarial examples as

$\arg\min_r \|r\|_2$ s.t. $f(x+r)=l$ and $x+r$ being a valid image

where $f$ is the neural network and $l$ the target class (i.e. targeted adversarial example). In the paper, they originally headlined the section by “blind spots in neural networks”. While they give some explanation and provide experiments, also introducing the notion of transferability of adversarial examples and an idea of adversarial examples used as regularization during training, many questions are left open. The given conclusion, that these adversarial examples are highly unlikely and that these examples lie dense within regular training examples are controversial in the literature.

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private