[link]
Summary by Tess Berthier 6 years ago
The paper presents a model-agnostic extension of deep learning classifiers based on a RNN with a visual attention mechanism for report generation.
![](https://i.imgur.com/3TQb5TG.png)
One of the most important points in this paper is not the model, but the dataset they itself: Luke Oakden-Rayner, one of the authors, is a radiologist and worked a lot to educate the public on current medical datasets ([chest x-ray blog post](https://lukeoakdenrayner.wordpress.com/2017/12/18/the-chestxray14-dataset-problems/)), how they are made and what are the problems associated with them. In this paper they used 50,363 frontal pelvic X-rays, containing 4,010 hip fractures, the original dataset contained descriptive sentences, but these had highly inconsistent structure and content. A radiologist created a new set of sentences more appropriate to the task, from their [blog post](https://lukeoakdenrayner.wordpress.com/2018/06/05/explain-yourself-machine-producing-simple-text-descriptions-for-ai-interpretability/):
> We simply created sentences with a fixed grammatical structure and a tiny vocabulary (26 words!). We stripped the task back to the simplest useful elements. For example: “There is a mildly displaced comminuted fracture of the left neck of the femur.” Using sentences like that we build a RNN to generate text*, on top of the detection model.
>And that is the research in a nutshell! No fancy new models, no new maths or theoretical breakthroughs. Just sensible engineering to make the task tractable.
This paper shows the importance of a well-built dataset in medical imaging and how it can thus lead to impressive results:
![](https://i.imgur.com/5BVU9WF.png)
more
less