Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space
Anh Nguyen
and
Jason Yosinski
and
Yoshua Bengio
and
Alexey Dosovitskiy
and
Jeff Clune
arXiv e-Print archive - 2016 via Local arXiv
Keywords:
cs.CV
First published: 2016/11/30 (7 years ago) Abstract: Generating high-resolution, photo-realistic images has been a long-standing
goal in machine learning. Recently, Nguyen et al. (2016) showed one interesting
way to synthesize novel images by performing gradient ascent in the latent
space of a generator network to maximize the activations of one or multiple
neurons in a separate classifier network. In this paper we extend this method
by introducing an additional prior on the latent code, improving both sample
quality and sample diversity, leading to a state-of-the-art generative model
that produces high quality images at higher resolutions (227x227) than previous
generative models, and does so for all 1000 ImageNet categories. In addition,
we provide a unified probabilistic interpretation of related activation
maximization methods and call the general class of models "Plug and Play
Generative Networks". PPGNs are composed of 1) a generator network G that is
capable of drawing a wide range of image types and 2) a replaceable "condition"
network C that tells the generator what to draw. We demonstrate the generation
of images conditioned on a class (when C is an ImageNet or MIT Places
classification network) and also conditioned on a caption (when C is an image
captioning network). Our method also improves the state of the art of
Multifaceted Feature Visualization, which generates the set of synthetic inputs
that activate a neuron in order to better understand how deep neural networks
operate. Finally, we show that our model performs reasonably well at the task
of image inpainting. While image models are used in this paper, the approach is
modality-agnostic and can be applied to many types of data.
_Objective:_ Find a generative model that avoids usual shortcomings: (i) high-resolution, (ii) variety of images and (iii) matching the dataset diversity.
_Dataset:_ [ImageNet](https://www.image-net.org/)
## Inner-workings:
The idea is to find an image that maximizes the probability for a given label by using a variant of a Markov Chain Monte Carlo (MCMC) sampler.
[![screen shot 2017-06-01 at 12 31 14 pm](https://cloud.githubusercontent.com/assets/17261080/26675978/3c9e6d94-46c6-11e7-9f67-477c4036a891.png)](https://cloud.githubusercontent.com/assets/17261080/26675978/3c9e6d94-46c6-11e7-9f67-477c4036a891.png)
Where the first term ensures that we stay in the image manifold that we're trying to find and don't just produce adversarial examples and the second term makes sure that find an image corresponding to the label we're looking for.
Basically we start with a random image and iteratively find a better image to match the label we're trying to generate.
### MALA-approx:
MALA-approx is the MCMC sampler based on the Metropolis-Adjusted Langevin Algorithm that they use in the paper, it is defined iteratively as follow:
[![screen shot 2017-06-01 at 12 25 45 pm](https://cloud.githubusercontent.com/assets/17261080/26675866/bf15cc28-46c5-11e7-9620-659d26f84bf8.png)](https://cloud.githubusercontent.com/assets/17261080/26675866/bf15cc28-46c5-11e7-9620-659d26f84bf8.png)
where:
* epsilon1 makes the image more generic.
* epsilon2 increases confidence in the chosen class.
* epsilon3 adds noise to encourage diversity.
### Image prior:
They try several priors for the images:
1. PPGN-x: p(x) is modeled with a Denoising Auto-Encoder (DAE).
[![screen shot 2017-06-01 at 1 48 33 pm](https://cloud.githubusercontent.com/assets/17261080/26678501/1737c64e-46d1-11e7-82a4-7ee0aa8bfe2f.png)](https://cloud.githubusercontent.com/assets/17261080/26678501/1737c64e-46d1-11e7-82a4-7ee0aa8bfe2f.png)
2. DGN-AM: use a latent space to model x with h using a GAN.
[![screen shot 2017-06-01 at 1 49 41 pm](https://cloud.githubusercontent.com/assets/17261080/26678517/2e743194-46d1-11e7-95dc-9bb638128242.png)](https://cloud.githubusercontent.com/assets/17261080/26678517/2e743194-46d1-11e7-95dc-9bb638128242.png)
3. PPGN-h: incorporates a prior for p(h) using a DAE.
[![screen shot 2017-06-01 at 1 51 14 pm](https://cloud.githubusercontent.com/assets/17261080/26678579/6bd8cb58-46d1-11e7-895d-f9432b7e5e1f.png)](https://cloud.githubusercontent.com/assets/17261080/26678579/6bd8cb58-46d1-11e7-895d-f9432b7e5e1f.png)
4. Joint PPGN-h: to increases expressivity of G, model h by first modeling x in the DAE.
[![screen shot 2017-06-01 at 1 51 23 pm](https://cloud.githubusercontent.com/assets/17261080/26678622/a7bf2f68-46d1-11e7-9209-98f97e0a218d.png)](https://cloud.githubusercontent.com/assets/17261080/26678622/a7bf2f68-46d1-11e7-9209-98f97e0a218d.png)
5. Noiseless joint PPGN-h: same as previous but without noise.
[![screen shot 2017-06-01 at 1 54 11 pm](https://cloud.githubusercontent.com/assets/17261080/26678655/d5499220-46d1-11e7-93d0-d48a6b6fa1a8.png)](https://cloud.githubusercontent.com/assets/17261080/26678655/d5499220-46d1-11e7-93d0-d48a6b6fa1a8.png)
### Conditioning:
In the paper they mostly use conditioning on label but captions or pretty much anything can also be used.
[![screen shot 2017-06-01 at 2 26 53 pm](https://cloud.githubusercontent.com/assets/17261080/26679654/6297ab86-46d6-11e7-86fa-f763face01ca.png)](https://cloud.githubusercontent.com/assets/17261080/26679654/6297ab86-46d6-11e7-86fa-f763face01ca.png)
## Architecture:
The final architecture using a pretrained classifier network is below. Note that only G and D are trained.
[![screen shot 2017-06-01 at 2 29 49 pm](https://cloud.githubusercontent.com/assets/17261080/26679785/db143520-46d6-11e7-9668-72864f1a8eb1.png)](https://cloud.githubusercontent.com/assets/17261080/26679785/db143520-46d6-11e7-9668-72864f1a8eb1.png)
## Results:
Pretty much any base network can be used with minimal training of G and D. It produces very realistic image with a great diversity, see below for examples of 227x227 images with ImageNet.
[![screen shot 2017-06-01 at 2 32 38 pm](https://cloud.githubusercontent.com/assets/17261080/26679884/4494002a-46d7-11e7-882e-c69aff2ddd17.png)](https://cloud.githubusercontent.com/assets/17261080/26679884/4494002a-46d7-11e7-882e-c69aff2ddd17.png)