This paper presents an interpretation of dropout training as performing approximate Bayesian learning in a deep Gaussian process (DGP) model. This connection suggests a very simple way of obtaining, for networks trained with dropout, estimates of the model's output uncertainty. This estimate is based and computed from an ensemble of networks each obtained by sampling a new dropout mask.
#### My two cents
This is a really nice and thought provoking contribution to our understanding of dropout. Unfortunately, the paper in fact doesn't provide a lot of comparisons with either other ways of estimating the predictive uncertainty of deep networks, or to other approximate inference schemes in deep GPs (actually, see update below). The qualitative examples provided however do suggest that the uncertainty estimate isn't terrible.
Irrespective of the quality of the uncertainty estimate suggested here, I find the observation itself really valuable. Perhaps future research will then shed light on how useful that method is compared to other approaches, including Bayesian dark knowledge \cite{conf/nips/BalanRMW15}.
`Update: On September 27th`, the authors uploaded to arXiv a new version that now includes comparisons with 2 alternative Bayesian learning methods for deep networks, specifically the stochastic variational inference approach of Graves and probabilistic back-propagation of Hernandez-Lobato and Adams. Dropout actually does very well against these baselines and, across datasets, is almost always amongst the best performing method!
#### Problem addressed:
Bayesian approximation of neural netowrks
#### Summary:
This paper gives an alternative view of dropout as Bayesian approximation, which allow one to obtain uncertainty from the predictions. The result is surprisingly simple, both the predictive mean and variance can be obtained by calculating the mean and variance (with some minor adjustment) of multiple passes through the network with dropout.
#### Novelty:
A new interpretation of dropout as a Bayesian approximation.
#### Drawbacks:
Some computational overhead, since calculating the predictive mean and variance need multiple passes through the network.
#### Datasets:
MNIST, solar irradiance, Maunua Loa Co2,
#### Resources:
Paper: http://arxiv.org/pdf/1506.02142v1.pdf
Blog post: http://mlg.eng.cam.ac.uk/yarin/blog_3d801aa532c1ce.html
#### Presenter:
Yingbo Zhou"