First published: 2017/02/28 (7 years ago) Abstract: As machine learning systems become ubiquitous, there has been a surge of
interest in interpretable machine learning: systems that provide explanation
for their outputs. These explanations are often used to qualitatively assess
other criteria such as safety or non-discrimination. However, despite the
interest in interpretability, there is very little consensus on what
interpretable machine learning is and how it should be measured. In this
position paper, we first define interpretability and describe when
interpretability is needed (and when it is not). Next, we suggest a taxonomy
for rigorous evaluation and expose open questions towards a more rigorous
science of interpretable machine learning.