Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)
Been Kim
and
Martin Wattenberg
and
Justin Gilmer
and
Carrie Cai
and
James Wexler
and
Fernanda Viegas
and
Rory Sayres
arXiv e-Print archive - 2017 via Local arXiv
Keywords:
stat.ML
First published: 2017/11/30 (7 years ago) Abstract: The interpretation of deep learning models is a challenge due to their size,
complexity, and often opaque internal state. In addition, many systems, such as
image classifiers, operate on low-level features rather than high-level
concepts. To address these challenges, we introduce Concept Activation Vectors
(CAVs), which provide an interpretation of a neural net's internal state in
terms of human-friendly concepts. The key idea is to view the high-dimensional
internal state of a neural net as an aid, not an obstacle. We show how to use
CAVs as part of a technique, Testing with CAVs (TCAV), that uses directional
derivatives to quantify the degree to which a user-defined concept is important
to a classification result--for example, how sensitive a prediction of "zebra"
is to the presence of stripes. Using the domain of image classification as a
testing ground, we describe how CAVs may be used to explore hypotheses and
generate insights for a standard image classification network as well as a
medical application.