Producing radiologist-quality reports for interpretable artificial intelligence
William Gale
and
Luke Oakden-Rayner
and
Gustavo Carneiro
and
Andrew P Bradley
and
Lyle J Palmer
arXiv e-Print archive - 2018 via Local arXiv
Keywords:
cs.AI
First published: 2018/06/01 (6 years ago) Abstract: Current approaches to explaining the decisions of deep learning systems for
medical tasks have focused on visualising the elements that have contributed to
each decision. We argue that such approaches are not enough to "open the black
box" of medical decision making systems because they are missing a key
component that has been used as a standard communication tool between doctors
for centuries: language. We propose a model-agnostic interpretability method
that involves training a simple recurrent neural network model to produce
descriptive sentences to clarify the decision of deep learning classifiers.
We test our method on the task of detecting hip fractures from frontal pelvic
x-rays. This process requires minimal additional labelling despite producing
text containing elements that the original deep learning classification model
was not specifically trained to detect.
The experimental results show that: 1) the sentences produced by our method
consistently contain the desired information, 2) the generated sentences are
preferred by doctors compared to current tools that create saliency maps, and
3) the combination of visualisations and generated text is better than either
alone.
The paper presents a model-agnostic extension of deep learning classifiers based on a RNN with a visual attention mechanism for report generation.
![](https://i.imgur.com/3TQb5TG.png)
One of the most important points in this paper is not the model, but the dataset they itself: Luke Oakden-Rayner, one of the authors, is a radiologist and worked a lot to educate the public on current medical datasets ([chest x-ray blog post](https://lukeoakdenrayner.wordpress.com/2017/12/18/the-chestxray14-dataset-problems/)), how they are made and what are the problems associated with them. In this paper they used 50,363 frontal pelvic X-rays, containing 4,010 hip fractures, the original dataset contained descriptive sentences, but these had highly inconsistent structure and content. A radiologist created a new set of sentences more appropriate to the task, from their [blog post](https://lukeoakdenrayner.wordpress.com/2018/06/05/explain-yourself-machine-producing-simple-text-descriptions-for-ai-interpretability/):
> We simply created sentences with a fixed grammatical structure and a tiny vocabulary (26 words!). We stripped the task back to the simplest useful elements. For example: “There is a mildly displaced comminuted fracture of the left neck of the femur.” Using sentences like that we build a RNN to generate text*, on top of the detection model.
>And that is the research in a nutshell! No fancy new models, no new maths or theoretical breakthroughs. Just sensible engineering to make the task tractable.
This paper shows the importance of a well-built dataset in medical imaging and how it can thus lead to impressive results:
![](https://i.imgur.com/5BVU9WF.png)