First published: 2013/12/20 (7 years ago) Abstract: How can we perform efficient inference and learning in directed probabilistic
models, in the presence of continuous latent variables with intractable
posterior distributions, and large datasets? We introduce a stochastic
variational inference and learning algorithm that scales to large datasets and,
under some mild differentiability conditions, even works in the intractable
case. Our contributions is two-fold. First, we show that a reparameterization
of the variational lower bound yields a lower bound estimator that can be
straightforwardly optimized using standard stochastic gradient methods. Second,
we show that for i.i.d. datasets with continuous latent variables per
datapoint, posterior inference can be made especially efficient by fitting an
approximate inference model (also called a recognition model) to the
intractable posterior using the proposed lower bound estimator. Theoretical
advantages are reflected in experimental results.

#### Problem addressed:
Variational learning of Bayesian networks
#### Summary:
This paper present a generic method for learning belief networks, which uses variational lower bound for the likelihood term.
#### Novelty:
Uses a re-parameterization trick to change random variables to deterministic function plus a noise term, so one can apply normal gradient based learning
#### Drawbacks:
The resulting model marginal likelihood is still intractible, may not be very good for applications that require the use of actual values of the marginal probablities
#### Datasets:
MNIST, Frey face
#### Additional remarks:
Experimentally compared with wake sleep algorithm on logliklihood lower bound as well as estimated marginal likelihood
#### Resources:
Implementation: https://github.com/y0ast/Variational-Autoencoder
#### Presenter:
Yingbo Zhou