Generative Adversarial Nets on ShortScience.org

papers.nips.cc
scholar.google.com

Generative Adversarial Nets
Goodfellow, Ian J. and Pouget-Abadie, Jean and Mirza, Mehdi and Xu, Bing and Warde-Farley, David and Ozair, Sherjil and Courville, Aaron C. and Bengio, Yoshua
Neural Information Processing Systems Conference - 2014 via Local Bibsonomy
Keywords: dblp

Summaries/Notes 3

[link] Summary by Shagun Sodhani 8 years ago

# Generative Adversarial Nets

## Introduction

* The paper proposes an adversarial approach for estimating generative models where one model (generative model) tries to learn a data distribution and another model (discriminative model) tries to distinguish between samples from the generative model and original data distribution.
* [Link to the paper](https://arxiv.org/abs/1406.2661)

## Adversarial Net

* Two models - Generative Model(*G*) and Discriminative Model(*D*)
* Both are multi-layer perceptrons.
* *G* takes as input a noise variable *z* and outputs data sample *x(=G(z))*.
* *D* takes as input a data sample *x* and predicts whether it came from true data or from *G*.
* *G* tries to minimise *log(1-D(G(z)))* while *D* tries to maximise the probability of correct classification.
* Think of it as a minimax game between 2 players and the global optimum would be when *G* generates perfect samples and *D* can not distinguish between the samples (thereby always returning 0.5 as the probability of sample coming from true data).
* Alternate between *k* steps of training *D* and 1 step of training *G* so that *D* is maintained near its optimal solution.
* When starting training, the loss *log(1-D(G(z)))* would saturate as *G* would be weak. Instead maximise *log(D(G(z)))*
* The paper contains the theoretical proof for global optimum of the minimax game.

## Experiments

* Datasets
    * MNIST, Toronto Face Database, CIFAR-10
* Generator model uses RELU and sigmoid activations.
* Discriminator model uses maxout and dropout.
* Evaluation Metric
    * Fit Gaussian Parzen window to samples obtained from *G* and compare log-likelihood.

## Strengths

* Computational advantages
    * Backprop is sufficient for training with no need for Markov chains or performing inference.
    * A variety of functions can be used in the model.
* Since *G* is trained only using the gradients from *D*, fewer chances of directly copying features from the true data.
* Can represent sharp (even degenerate) distributions.

## Weakness

* *D* must be well synchronised with *G*.
* While *G* may learn to sample data points that are indistinguishable from true data, no explicit representation can be obtained.

## Possible Extensions

* Conditional generative models.
* Inference network to predict *z* given *x*.
* Implement a stochastic extension of the deterministic [Multi-Prediction Deep Boltzmann Machines](https://papers.nips.cc/paper/5024-multi-prediction-deep-boltzmann-machines.pdf)
* Using discriminator net or inference net for feature selection.
* Accelerating training by ensuring better coordination between *G* and *D* or by determining better distributions to sample *z* from during training.

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private