Effective Ways to Build and Evaluate Individual Survival Distributions
Humza Haider
and
Bret Hoehn
and
Sarah Davis
and
Russell Greiner
arXiv e-Print archive - 2018 via Local arXiv
Keywords:
cs.LG, stat.ML
First published: 2018/11/28 (5 years ago) Abstract: An accurate model of a patient's individual survival distribution can help
determine the appropriate treatment for terminal patients. Unfortunately, risk
scores (e.g., from Cox Proportional Hazard models) do not provide survival
probabilities, single-time probability models (e.g., the Gail model, predicting
5 year probability) only provide for a single time point, and standard
Kaplan-Meier survival curves provide only population averages for a large class
of patients meaning they are not specific to individual patients. This
motivates an alternative class of tools that can learn a model which provides
an individual survival distribution which gives survival probabilities across
all times - such as extensions to the Cox model, Accelerated Failure Time, an
extension to Random Survival Forests, and Multi-Task Logistic Regression. This
paper first motivates such "individual survival distribution" (ISD) models, and
explains how they differ from standard models. It then discusses ways to
evaluate such models - namely Concordance, 1-Calibration, Brier score, and
various versions of L1-loss - and then motivates and defines a novel approach
"D-Calibration", which determines whether a model's probability estimates are
meaningful. We also discuss how these measures differ, and use them to evaluate
several ISD prediction tools, over a range of survival datasets.
The paper looks at approaches to predicting individual survival time distributions (isd). The motivation is shown in the figure below. Between two patients the survival time varies greatly so we should be able to predict a distribution like the red curve.
https://i.imgur.com/2r9JvUp.png
The paper studies the following methods:
- class-based survival curves Kaplan-Meier [31]
- Kalbfleisch-Prentice extension of the Cox (cox-kp) [29]
- Accelerated Failure Time (aft) model [29]
- Random Survival Forest model with Kaplan-Meier extensions (rsf-km)
- elastic net Cox (coxen-kp) [55]
- Multi-task Logistic Regression (mtlr) [57]
Looking at the predictions of these methods side by side we can observe some systematic differences between the methods:
https://i.imgur.com/vJoCL4a.png
The paper presents a "D-Calibration" metric (distributional calibration) which represents of the method answers this question:
Should the patient believe the predictions implied by the survival curve?
https://i.imgur.com/MX8CbZ7.png