Bounding the Test Log-Likelihood of Generative Models
Yoshua Bengio
and
Li Yao
and
Kyunghyun Cho
arXiv e-Print archive - 2013 via Local arXiv
Keywords:
cs.LG
First published: 2013/11/24 (10 years ago) Abstract: Several interesting generative learning algorithms involve a complex
probability distribution over many random variables, involving intractable
normalization constants or latent variable normalization. Some of them may even
not have an analytic expression for the unnormalized probability function and
no tractable approximation. This makes it difficult to estimate the quality of
these models, once they have been trained, or to monitor their quality (e.g.
for early stopping) while training. A previously proposed method is based on
constructing a non-parametric density estimator of the model's probability
function from samples generated by the model. We revisit this idea, propose a
more efficient estimator, and prove that it provides a lower bound on the true
test log-likelihood, and an unbiased estimator as the number of generated
samples goes to infinity, although one that incorporates the effect of poor
mixing. We further propose a biased variant of the estimator that can be used
reliably with a finite number of samples for the purpose of model comparison.
#### Problem addressed:
Evaluation and comparison of generative models
#### Summary:
This paper improves upon an existing non parametric estimator by sampling from hidden variables instead of features. They present an unbiased estimator and prove it asymptotically converges to true distribution with number of samples. They also prove that the expected value of unbiased estimator is a lower bound on the true distribution. They also present a biased estimator with a different sampling scheme. They empirically validate their estimators using MNIST dataset on different generative models
#### Novelty:
Sampling from hidden space for non-parametric estimation
#### Drawbacks:
This method works only for models which have hidden variables. Application for deep networks is not clear. Procedure for sampling from hidden variables is not explicitly mentioned. Assumes that P(x|h) is easily calculated from the model
#### Datasets:
MNIST
#### Resources:
paper: http://arxiv.org/pdf/1311.6184v4.pdf
#### Presenter:
Bhargava U. Kota