Clinical Intervention Prediction and Understanding using Deep Networks
Harini Suresh
and
Nathan Hunt
and
Alistair Johnson
and
Leo Anthony Celi
and
Peter Szolovits
and
Marzyeh Ghassemi
arXiv e-Print archive - 2017 via Local arXiv
Keywords:
cs.LG
First published: 2017/05/23 (7 years ago) Abstract: Real-time prediction of clinical interventions remains a challenge within
intensive care units (ICUs). This task is complicated by data sources that are
noisy, sparse, heterogeneous and outcomes that are imbalanced. In this paper,
we integrate data from all available ICU sources (vitals, labs, notes,
demographics) and focus on learning rich representations of this data to
predict onset and weaning of multiple invasive interventions. In particular, we
compare both long short-term memory networks (LSTM) and convolutional neural
networks (CNN) for prediction of five intervention tasks: invasive ventilation,
non-invasive ventilation, vasopressors, colloid boluses, and crystalloid
boluses. Our predictions are done in a forward-facing manner to enable
"real-time" performance, and predictions are made with a six hour gap time to
support clinically actionable planning. We achieve state-of-the-art results on
our predictive tasks using deep architectures. We explore the use of feature
occlusion to interpret LSTM models, and compare this to the interpretability
gained from examining inputs that maximally activate CNN outputs. We show that
our models are able to significantly outperform baselines in intervention
prediction, and provide insight into model learning, which is crucial for the
adoption of such models in practice.
#### Goal:
Predict interventions on ICU patients using LSTM and CNN.
#### Dataset:
MIMIC-III v.1.4 https://mimic.physionet.org/
+ Patients over 15 years of age with intensive care stay between 12h and 240h. (Only the first stay is considered for each patient) - 34148 unique records.
+ 5 static variables.
+ 29 vital signs and test results.
+ Clinical notes of patients (presented as time series).
#### Feature Engineering:
+ Topic Modeling of clinical notes: Vector of topics using Latent Dirichlet Allocation (LDA)
+ Physiological Words: Vital / Laboratory results converted to z-scores - [integer values between -4 and 4] and score is one-hot encoded (each vital / lab is replaced by 9 columns). It is good idea to avoid the imputation of missing values as the physiological word in this case is the all-zero vector.
Feature vector:
+ is the concatenation of the static variables, physiological words for each vital/lab and the topic vector.
+ 1 feature vector / patient / hour.
+ 6-hour slice used to predict a 4-hour window after a 6-hour gap. All the features values are normalized between 0 and 1. (static variables are replicated).
#### Target Classes:
For some of the procedures to be predicted there are 4 classes:
+ Onset: Y goes from 0 to 1 during the prediction window.
+ Wean: Y goes from 1 to 0 during the prediction window.
+ Stay On: Y stays at 1 throughout prediction window.
+ Stay Off: Y stays at 0 for the entire prediction window.
#### Setup of the Experiments:
+ Dataset Split: 70% training, 10% validation, 20% test.
Long Short Term Memory (LSTM) Networks:
+ Dropout P(keep) = 0.8, L2 regularization.
+ 2 hidden layers: 512 nodes in each.
Convolutional Neural Networks
+ 3 different temporal granularities (3, 4, 5 hours). 64 filters in each.
+ Features are treated as channels. 1D temporal convolution.
+ Dropout between fully connected layers. P (keep) = 0.5.
TensorFlow 1.0.1 - Adam optimizer. Minibatches of size 128.
Validation set used for early stopping (metric: AUC).
#### Results:
+ Baseline for comparison: L2-regularized Logistic Regression
+ Metrics:
+ AUC per class.
+ AUC macro = Arithmetic mean of AUC per class.
+ Proposed architectures outperforms baseline.
+ Physiological words improve performance (especially on high class imbalance scenario).
#### Model Interpretability:
+ LSTM: feature occlusion like analysis. The feature is replaced by uniformly distributed noise between 0 and 1 and variation in AUC is computed.
+ CNN: analysis of the maximally activating trajectories.
#### Positive Aspects:
+ Relevant work: In the healthcare domain is very important to anticipate events.
+ Built on top of rich and heterogeneous data: It leverages large amounts of ICU data.
+ The proposed model is not a complete black-box. Interpretability is crucial if the system is to be adopted in the future.
#### Caveats:
+ Some of the methodology is not clearly explained:
+ How the split of the dataset was performed? Was it on a patient-level?
+ When testing the logistic regression baseline it is not clear how the feature vector was built. Was it built by simply flattening the 6-hour chunk?
+ For the raw data test, it was not mentioned the way the missing values were treated.