Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) on ShortScience.org

arxiv.org
arxiv-vanity.com
scholar.google.com

Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)
Been Kim and Martin Wattenberg and Justin Gilmer and Carrie Cai and James Wexler and Fernanda Viegas and Rory Sayres
arXiv e-Print archive - 2017 via Local arXiv
Keywords: stat.ML
more

Summaries/Notes 1

[link] Summary by David Stutz 5 years ago

Kim et al. propose Concept Activation Vectors (CAV) that represent the direction of features corresponding to specific human-interpretable concepts. In particular, given a network for a classification task, a concept is defined as a set of images with that concept. A linear classifier is then trained to distinguish images with concept from random images without the concept based on a chosen feature layer. The normal of the obtained linear classification boundary corresponds to the learned Concept Activation Vector (CAV). By considering the directional derivative along this direction for a given input allows to quantify how well the input aligns with the chosen concept. This way, images can be ranked and the model’ sensitivity to particular concepts can be quantified. The idea is also illustrated in Figure 1.

https://i.imgur.com/KOqPeag.png
Figure 1: Process of constructing Concept Activation Vectors (CAVs).

Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/).

Your comment: