Decomposition of Uncertainty in Bayesian Deep Learning for Efficient and Risk-sensitive Learning
Stefan Depeweg
and
José Miguel Hernández-Lobato
and
Finale Doshi-Velez
and
Steffen Udluft
arXiv e-Print archive - 2017 via Local arXiv
Keywords:
stat.ML, cs.LG
First published: 2017/10/19 (6 years ago) Abstract: Bayesian neural networks with latent variables (BNNs+LVs) are scalable and
flexible probabilistic models: They account for uncertainty in the estimation
of the network weights and, by making use of latent variables, they can capture
complex noise patterns in the data. In this work, we show how to separate these
two forms of uncertainty for decision-making purposes. This decomposition
allows us to successfully identify informative points for active learning of
functions with heteroskedastic and bimodal noise. We also demonstrate how this
decomposition allows us to define a novel risk-sensitive reinforcement learning
criterion to identify policies that balance expected cost, model-bias and noise
averseness.
The paper starts with the BNN with latent variable and proposes an entropy-based and a variance-based measure of prediction uncertainty. For each uncertainty measure, the authors propose a decomposition of the aleatoric term and epistemic term. A simple regression toy experiment proves this decomposition and its measure of uncertainty. Then the author tries to improve the regression toy experiment performance by using this uncertainty measure into an active learning scheme. For each batch, they would actively sample which data to label. The result shows that using epistemic uncertainty alone outperforms using total certainty, which both outperforms simple gaussian process. The result is understandable since epistemic is directly related to model weight uncertainty, and sampling from high aleatoric uncertain area does help supervised learning.
Then the authors talk about how to extend the model based RL by adding a risk term which consider both aleatoric term and epistemic term, and its related to model-bias and noise aversion. The experiments on Industrial Benchmark shows the method is able prevent overfitting the learned model and better transfer to real world, but the method seems to be pretty sensitive to $\beta$ and $\gamma$.