Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space on ShortScience.org

arxiv.org
arxiv-vanity.com
scholar.google.com

Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space
Anh Nguyen and Jason Yosinski and Yoshua Bengio and Alexey Dosovitskiy and Jeff Clune
arXiv e-Print archive - 2016 via Local arXiv
Keywords: cs.CV
more

Summaries/Notes 1

[link] Summary by Léo Paillier 7 years ago

_Objective:_ Find a generative model that avoids usual shortcomings: (i) high-resolution, (ii) variety of images and (iii) matching the dataset diversity.

_Dataset:_ [ImageNet](https://www.image-net.org/)

## Inner-workings:

The idea is to find an image that maximizes the probability for a given label by using a variant of a Markov Chain Monte Carlo (MCMC) sampler.  
[![screen shot 2017-06-01 at 12 31 14 pm](https://cloud.githubusercontent.com/assets/17261080/26675978/3c9e6d94-46c6-11e7-9f67-477c4036a891.png)](https://cloud.githubusercontent.com/assets/17261080/26675978/3c9e6d94-46c6-11e7-9f67-477c4036a891.png)  
Where the first term ensures that we stay in the image manifold that we're trying to find and don't just produce adversarial examples and the second term makes sure that find an image corresponding to the label we're looking for.

Basically we start with a random image and iteratively find a better image to match the label we're trying to generate.

### MALA-approx:

MALA-approx is the MCMC sampler based on the Metropolis-Adjusted Langevin Algorithm that they use in the paper, it is defined iteratively as follow:  
[![screen shot 2017-06-01 at 12 25 45 pm](https://cloud.githubusercontent.com/assets/17261080/26675866/bf15cc28-46c5-11e7-9620-659d26f84bf8.png)](https://cloud.githubusercontent.com/assets/17261080/26675866/bf15cc28-46c5-11e7-9620-659d26f84bf8.png)  
where:

*   epsilon1 makes the image more generic.
*   epsilon2 increases confidence in the chosen class.
*   epsilon3 adds noise to encourage diversity.

### Image prior:

They try several priors for the images:

1.  PPGN-x: p(x) is modeled with a Denoising Auto-Encoder (DAE).

[![screen shot 2017-06-01 at 1 48 33 pm](https://cloud.githubusercontent.com/assets/17261080/26678501/1737c64e-46d1-11e7-82a4-7ee0aa8bfe2f.png)](https://cloud.githubusercontent.com/assets/17261080/26678501/1737c64e-46d1-11e7-82a4-7ee0aa8bfe2f.png)

2.  DGN-AM: use a latent space to model x with h using a GAN.

[![screen shot 2017-06-01 at 1 49 41 pm](https://cloud.githubusercontent.com/assets/17261080/26678517/2e743194-46d1-11e7-95dc-9bb638128242.png)](https://cloud.githubusercontent.com/assets/17261080/26678517/2e743194-46d1-11e7-95dc-9bb638128242.png)

3.  PPGN-h: incorporates a prior for p(h) using a DAE.

[![screen shot 2017-06-01 at 1 51 14 pm](https://cloud.githubusercontent.com/assets/17261080/26678579/6bd8cb58-46d1-11e7-895d-f9432b7e5e1f.png)](https://cloud.githubusercontent.com/assets/17261080/26678579/6bd8cb58-46d1-11e7-895d-f9432b7e5e1f.png)

4.  Joint PPGN-h: to increases expressivity of G, model h by first modeling x in the DAE.

[![screen shot 2017-06-01 at 1 51 23 pm](https://cloud.githubusercontent.com/assets/17261080/26678622/a7bf2f68-46d1-11e7-9209-98f97e0a218d.png)](https://cloud.githubusercontent.com/assets/17261080/26678622/a7bf2f68-46d1-11e7-9209-98f97e0a218d.png)

5.  Noiseless joint PPGN-h: same as previous but without noise.

[![screen shot 2017-06-01 at 1 54 11 pm](https://cloud.githubusercontent.com/assets/17261080/26678655/d5499220-46d1-11e7-93d0-d48a6b6fa1a8.png)](https://cloud.githubusercontent.com/assets/17261080/26678655/d5499220-46d1-11e7-93d0-d48a6b6fa1a8.png)

### Conditioning:

In the paper they mostly use conditioning on label but captions or pretty much anything can also be used.  
[![screen shot 2017-06-01 at 2 26 53 pm](https://cloud.githubusercontent.com/assets/17261080/26679654/6297ab86-46d6-11e7-86fa-f763face01ca.png)](https://cloud.githubusercontent.com/assets/17261080/26679654/6297ab86-46d6-11e7-86fa-f763face01ca.png)

## Architecture:

The final architecture using a pretrained classifier network is below. Note that only G and D are trained.  
[![screen shot 2017-06-01 at 2 29 49 pm](https://cloud.githubusercontent.com/assets/17261080/26679785/db143520-46d6-11e7-9668-72864f1a8eb1.png)](https://cloud.githubusercontent.com/assets/17261080/26679785/db143520-46d6-11e7-9668-72864f1a8eb1.png)

## Results:

Pretty much any base network can be used with minimal training of G and D. It produces very realistic image with a great diversity, see below for examples of 227x227 images with ImageNet.  
[![screen shot 2017-06-01 at 2 32 38 pm](https://cloud.githubusercontent.com/assets/17261080/26679884/4494002a-46d7-11e7-882e-c69aff2ddd17.png)](https://cloud.githubusercontent.com/assets/17261080/26679884/4494002a-46d7-11e7-882e-c69aff2ddd17.png)

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private