Explaining Image Classifiers by Counterfactual Generation
Chun-Hao Chang
and
Elliot Creager
and
Anna Goldenberg
and
David Duvenaud
arXiv e-Print archive - 2018 via Local arXiv
Keywords:
cs.CV
First published: 2018/07/20 (6 years ago) Abstract: When an image classifier makes a prediction, which parts of the image are
relevant and why? We can rephrase this question to ask: which parts of the
image, if they were not seen by the classifier, would most change its decision?
Producing an answer requires marginalizing over images that could have been
seen but weren't. We can sample plausible image in-fills by conditioning a
generative model on the rest of the image. We then optimize to find the image
regions that most change the classifier's decision after in-fill. Our approach
contrasts with ad-hoc in-filling approaches, such as blurring or injecting
noise, which generate inputs far from the data distribution, and ignore
informative relationships between different parts of the image. Our method
produces more compact and relevant saliency maps, with fewer artifacts compared
to previous methods.