Generative Adversarial Nets
Goodfellow, Ian J.
and
Pouget-Abadie, Jean
and
Mirza, Mehdi
and
Xu, Bing
and
Warde-Farley, David
and
Ozair, Sherjil
and
Courville, Aaron C.
and
Bengio, Yoshua
Neural Information Processing Systems Conference - 2014 via Local Bibsonomy
Keywords:
dblp
# Generative Adversarial Nets
## Introduction
* The paper proposes an adversarial approach for estimating generative models where one model (generative model) tries to learn a data distribution and another model (discriminative model) tries to distinguish between samples from the generative model and original data distribution.
* [Link to the paper](https://arxiv.org/abs/1406.2661)
## Adversarial Net
* Two models - Generative Model(*G*) and Discriminative Model(*D*)
* Both are multi-layer perceptrons.
* *G* takes as input a noise variable *z* and outputs data sample *x(=G(z))*.
* *D* takes as input a data sample *x* and predicts whether it came from true data or from *G*.
* *G* tries to minimise *log(1-D(G(z)))* while *D* tries to maximise the probability of correct classification.
* Think of it as a minimax game between 2 players and the global optimum would be when *G* generates perfect samples and *D* can not distinguish between the samples (thereby always returning 0.5 as the probability of sample coming from true data).
* Alternate between *k* steps of training *D* and 1 step of training *G* so that *D* is maintained near its optimal solution.
* When starting training, the loss *log(1-D(G(z)))* would saturate as *G* would be weak. Instead maximise *log(D(G(z)))*
* The paper contains the theoretical proof for global optimum of the minimax game.
## Experiments
* Datasets
* MNIST, Toronto Face Database, CIFAR-10
* Generator model uses RELU and sigmoid activations.
* Discriminator model uses maxout and dropout.
* Evaluation Metric
* Fit Gaussian Parzen window to samples obtained from *G* and compare log-likelihood.
## Strengths
* Computational advantages
* Backprop is sufficient for training with no need for Markov chains or performing inference.
* A variety of functions can be used in the model.
* Since *G* is trained only using the gradients from *D*, fewer chances of directly copying features from the true data.
* Can represent sharp (even degenerate) distributions.
## Weakness
* *D* must be well synchronised with *G*.
* While *G* may learn to sample data points that are indistinguishable from true data, no explicit representation can be obtained.
## Possible Extensions
* Conditional generative models.
* Inference network to predict *z* given *x*.
* Implement a stochastic extension of the deterministic [Multi-Prediction Deep Boltzmann Machines](https://papers.nips.cc/paper/5024-multi-prediction-deep-boltzmann-machines.pdf)
* Using discriminator net or inference net for feature selection.
* Accelerating training by ensuring better coordination between *G* and *D* or by determining better distributions to sample *z* from during training.