Explaining Explanations: An Approach to Evaluating Interpretability of Machine Learning on ShortScience.org

arxiv.org
scholar.google.com

Explaining Explanations: An Approach to Evaluating Interpretability of Machine Learning
Gilpin, Leilani H. and Bau, David and Yuan, Ben Z. and Bajwa, Ayesha and Specter, Michael and Kagal, Lalana
- 2018 via Local Bibsonomy
Keywords: artificial, intelligence

Summaries/Notes 1

[link] Summary by Apoorva Shetty 6 years ago

With growing use of ML and AI solutions to complex problems, there is a rise in need for understanding and explaining these models appropriately however these explanations vary in how well they adhere to the model/ explain the decisions in a human understandable way.

**Idea** : There is no standard method of categorizing interpretation methods/ explanations, and no good working practices in the field of interpretability.

**Solution** : This paper explores and categorizes different approaches to interpreting machine learning models. The three main categories this paper proposes are:
- Processing: interpretation approach that uses surrogate models to explain complex models
- Representation: interpretation approach that analyzes intermediate data representations in models with transferability of data/ layers 
- Explaining Producing: interpretation approach in which the trained model as part of it's processing also generates an explanation for its process.

In this paper we see different approaches to interpretation in detail, analyzing what the major component is to the interpretation, And which proposed category the explanation method would fall under. The paper goes into detail about other research papers that also deal with categorizing or exploring explanations, and the overall meaning of explainability in other domains.

This paper also touches on how "completeness" (defined as how close the explanation is to the underlying model) and "interpretation" (defined as how easily humans can understand/ trust the model) do have tradeoffs, the author argues that these tradeoffs not only exist in the final explanation, but within each category the definition of completeness would be different and the metric used to measure this would change, which makes sense when you think that different users have different viewpoints on how a model should behave, and what the desired explanation for a result is.

Your comment: