[link]
_Objective:_ Find a generative model that avoids usual shortcomings: (i) high-resolution, (ii) variety of images and (iii) matching the dataset diversity. _Dataset:_ [ImageNet](https://www.image-net.org/) ## Inner-workings: The idea is to find an image that maximizes the probability for a given label by using a variant of a Markov Chain Monte Carlo (MCMC) sampler. [](https://cloud.githubusercontent.com/assets/17261080/26675978/3c9e6d94-46c6-11e7-9f67-477c4036a891.png) Where the first term ensures that we stay in the image manifold that we're trying to find and don't just produce adversarial examples and the second term makes sure that find an image corresponding to the label we're looking for. Basically we start with a random image and iteratively find a better image to match the label we're trying to generate. ### MALA-approx: MALA-approx is the MCMC sampler based on the Metropolis-Adjusted Langevin Algorithm that they use in the paper, it is defined iteratively as follow: [](https://cloud.githubusercontent.com/assets/17261080/26675866/bf15cc28-46c5-11e7-9620-659d26f84bf8.png) where: * epsilon1 makes the image more generic. * epsilon2 increases confidence in the chosen class. * epsilon3 adds noise to encourage diversity. ### Image prior: They try several priors for the images: 1. PPGN-x: p(x) is modeled with a Denoising Auto-Encoder (DAE). [](https://cloud.githubusercontent.com/assets/17261080/26678501/1737c64e-46d1-11e7-82a4-7ee0aa8bfe2f.png) 2. DGN-AM: use a latent space to model x with h using a GAN. [](https://cloud.githubusercontent.com/assets/17261080/26678517/2e743194-46d1-11e7-95dc-9bb638128242.png) 3. PPGN-h: incorporates a prior for p(h) using a DAE. [](https://cloud.githubusercontent.com/assets/17261080/26678579/6bd8cb58-46d1-11e7-895d-f9432b7e5e1f.png) 4. Joint PPGN-h: to increases expressivity of G, model h by first modeling x in the DAE. [](https://cloud.githubusercontent.com/assets/17261080/26678622/a7bf2f68-46d1-11e7-9209-98f97e0a218d.png) 5. Noiseless joint PPGN-h: same as previous but without noise. [](https://cloud.githubusercontent.com/assets/17261080/26678655/d5499220-46d1-11e7-93d0-d48a6b6fa1a8.png) ### Conditioning: In the paper they mostly use conditioning on label but captions or pretty much anything can also be used. [](https://cloud.githubusercontent.com/assets/17261080/26679654/6297ab86-46d6-11e7-86fa-f763face01ca.png) ## Architecture: The final architecture using a pretrained classifier network is below. Note that only G and D are trained. [](https://cloud.githubusercontent.com/assets/17261080/26679785/db143520-46d6-11e7-9668-72864f1a8eb1.png) ## Results: Pretty much any base network can be used with minimal training of G and D. It produces very realistic image with a great diversity, see below for examples of 227x227 images with ImageNet. [](https://cloud.githubusercontent.com/assets/17261080/26679884/4494002a-46d7-11e7-882e-c69aff2ddd17.png) ![]()
Your comment:
|