Sanity Checks for Saliency Maps
Julius Adebayo
and
Justin Gilmer
and
Michael Muelly
and
Ian Goodfellow
and
Moritz Hardt
and
Been Kim
arXiv e-Print archive - 2018 via Local arXiv
Keywords:
cs.CV, cs.LG, stat.ML
First published: 2018/10/08 (5 years ago) Abstract: Saliency methods have emerged as a popular tool to highlight features in an
input deemed relevant for the prediction of a learned model. Several saliency
methods have been proposed, often guided by visual appeal on image data. In
this work, we propose an actionable methodology to evaluate what kinds of
explanations a given method can and cannot provide. We find that reliance,
solely, on visual assessment can be misleading. Through extensive experiments
we show that some existing saliency methods are independent both of the model
and of the data generating process. Consequently, methods that fail the
proposed tests are inadequate for tasks that are sensitive to either data or
model, such as, finding outliers in the data, explaining the relationship
between inputs and outputs that the model learned, and debugging the model. We
interpret our findings through an analogy with edge detection in images, a
technique that requires neither training data nor model. Theory in the case of
a linear model and a single-layer convolutional neural network supports our
experimental findings.
**Idea:** With the growing use of visual explanation systems of machine learning models such as saliency maps, there needs to be a standardized method of verifying if a saliency method is correctly describing the underlying ML model.
**Solution:** In this paper two Sanity Checks have been proposed to verify the accuracy and the faithfulness of a saliency method:
* *Model parameter randomization test:* In this sanity check the outputs of a saliency method on a trained model is compared to that of the same method on an untrained randomly parameterized model. If these images are similar/identical then this saliency method does not correctly describe the model. In the course of this experiment it is found that certain methods such as the Guided BackProp are constant in their explanations despite alterations in the model.
* *Data Randomization Test:* This method explores the relationship of saliency methods to data and their associated labels. In this test, the labels of the training data are randomized thus there should be no definite pattern describing the model (Since the model is as good as randomly guessing an output label). If there is a definite pattern, this shows that the saliency methods are independent of the underlying model/training data labels. In this test as well Guided BackProp did not fare well, implying this saliency method is as good as an edge detector as opposed to a ML explainer.
Thus this paper makes a valid argument toward having standardized tests that an interpretation model must satisfy to be deemed accurate or faithful.