[link]
The paper presents a model-agnostic extension of deep learning classifiers based on a RNN with a visual attention mechanism for report generation. ![](https://i.imgur.com/3TQb5TG.png) One of the most important points in this paper is not the model, but the dataset they itself: Luke Oakden-Rayner, one of the authors, is a radiologist and worked a lot to educate the public on current medical datasets ([chest x-ray blog post](https://lukeoakdenrayner.wordpress.com/2017/12/18/the-chestxray14-dataset-problems/)), how they are made and what are the problems associated with them. In this paper they used 50,363 frontal pelvic X-rays, containing 4,010 hip fractures, the original dataset contained descriptive sentences, but these had highly inconsistent structure and content. A radiologist created a new set of sentences more appropriate to the task, from their [blog post](https://lukeoakdenrayner.wordpress.com/2018/06/05/explain-yourself-machine-producing-simple-text-descriptions-for-ai-interpretability/): > We simply created sentences with a fixed grammatical structure and a tiny vocabulary (26 words!). We stripped the task back to the simplest useful elements. For example: “There is a mildly displaced comminuted fracture of the left neck of the femur.” Using sentences like that we build a RNN to generate text*, on top of the detection model. >And that is the research in a nutshell! No fancy new models, no new maths or theoretical breakthroughs. Just sensible engineering to make the task tractable. This paper shows the importance of a well-built dataset in medical imaging and how it can thus lead to impressive results: ![](https://i.imgur.com/5BVU9WF.png) |