WAIC, but Why? Generative Ensembles for Robust Anomaly Detection
Hyunsun Choi
and
Eric Jang
and
Alexander A. Alemi
arXiv e-Print archive - 2018 via Local arXiv
Keywords:
stat.ML, cs.LG
First published: 2018/10/02 (6 years ago) Abstract: Machine learning models encounter Out-of-Distribution (OoD) errors when the
data seen at test time are generated from a different stochastic generator than
the one used to generate the training data. One proposal to scale OoD detection
to high-dimensional data is to learn a tractable likelihood approximation of
the training distribution, and use it to reject unlikely inputs. However,
likelihood models on natural data are themselves susceptible to OoD errors, and
even assign large likelihoods to samples from other datasets. To mitigate this
problem, we propose Generative Ensembles, which robustify density-based OoD
detection by way of estimating epistemic uncertainty of the likelihood model.
We present a puzzling observation in need of an explanation -- although
likelihood measures cannot account for the typical set of a distribution, and
therefore should not be suitable on their own for OoD detection, WAIC performs
surprisingly well in practice.
### Summary
Knowing when a model is qualified to make a prediction is critical to safe deployment of ML technology. Model-independent / Unsupervised Out-of-Distribution (OoD) detection is appealing mostly because it doesn't require task-specific labels to train. It is tempting to suggest a simple one-tailed test in which lower likelihoods are OoD (assigned by a Likelihood Model), but the intuition that In-Distribution (ID) inputs should have highest likelihoods _does not hold in higher dimension_. The authors propose to use the Watanabe-Akaike Information Criterion (WAIC) to circumvent this problem and empirically show the robustness of the approach.
### Counterintuitive Properties of Likelihood Models:
https://i.imgur.com/4vo0Ff5.png
So a GLOW model with Gaussian prior maps SVHN closer to the origin than Cifar (but never actually generates SVHN because Gaussian samples are on the shell). This is bad news for OoD detection.
### Proposed Methodology:
Use the WAIC criterion for OoD detection which gives an asymptotically correct estimate of the gap between the training set and test set expectations:
https://i.imgur.com/vasSxuk.png
Basically, the correction term subtracts the variance in likelihoods across independent samples from the posterior. This acts to robustify the estimate, ensuring that points that are sensitive to the particular choice of posterior are penalized. They use an ensemble of generative models as a proxy for posterior samples i.e. the ensembles acts as approximate posterior samples.
Now, OoD can be detected with a Likelihood Model:
https://i.imgur.com/M3CDKOA.png
### Discussion
Interestingly, GLOW maps Cifar and other datasets INSIDE the gaussian shell (which is an annulus of radius $\sqrt{dim} = \sqrt{3072} \approx 55.4$
https://i.imgur.com/ERdgOaz.png
This is in itself quite disturbing, as it suggests that better flow-based generative models (for sampling) can be obtained by encouraging the training distribution to overlap better with the typical set in latent
space.