Interpretation of Neural Networks is Fragile on ShortScience.org

arxiv.org
scholar.google.com

Interpretation of Neural Networks is Fragile
Ghorbani, Amirata and Abid, Abubakar and Zou, James Y.
arXiv e-Print archive - 2017 via Local Bibsonomy
Keywords: dblp

Summaries/Notes 1

[link] Summary by David Stutz 6 years ago

Ghorbani et al. Show that neural network visualization techniques, often introduced to improve interpretability, are susceptible to adversarial examples. For example, they consider common feature-importance visualization techniques and aim to find an advesarial example that does not change the predicted label but the original interpretation – e.g., as measured on some of the most important features. Examples of the so-called top-1000 attack where the 1000 most important features are changed during the attack are shown in Figure 1. The general finding, i.e., that interpretations are not robust or reliable, is definitely of relevance for the general acceptance and security of deep learning systems in practice.

https://i.imgur.com/QFyrSeU.png
Figure 1: Examples of changed interpretations.

Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/).

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private