[link]
Summary by Alexander Jung 7 years ago
* What
* They describe a new architecture for GANs.
* The architecture is based on letting the Generator (G) create images in multiple steps, similar to DRAW.
* They also briefly suggest a method to compare the quality of the results of different generators with each other.
* How
* In a classic GAN one samples a noise vector `z`, feeds that into a Generator (`G`), which then generates an image `x`, which is then fed through the Discriminator (`D`) to estimate its quality.
* Their method operates in basically the same way, but internally G is changed to generate images in multiple time steps.
* Outline of how their G operates:
* Time step 0:
* Input: Empty image `delta C-1`, randomly sampled `z`.
* Feed `delta C-1` through a number of downsampling convolutions to create a tensor. (Not very useful here, as the image is empty. More useful in later timesteps.)
* Feed `z` through a number of upsampling convolutions to create a tensor (similar to DCGAN).
* Concat the output of the previous two steps.
* Feed that concatenation through a few more convolutions.
* Output: `delta C0` (changes to apply to the empty starting canvas).
* Time step 1 (and later):
* Input: Previous change `delta C0`, randomly sampled `z` (can be the same as in step 0).
* Feed `delta C0` through a number of downsampling convolutions to create a tensor.
* Feed `z` through a number of upsampling convolutions to create a tensor (similar to DCGAN).
* Concat the output of the previous two steps.
* Feed that concatenation through a few more convolutions.
* Output: `delta C1` (changes to apply to the empty starting canvas).
* At the end, after all timesteps have been performed:
* Create final output image by summing all the changes, i.e. `delta C0 + delta C1 + ...`, which basically means `empty start canvas + changes from time step 0 + changes from time step 1 + ...`.
* Their architecture as an image:
* ![Architecture](https://raw.githubusercontent.com/aleju/papers/master/neural-nets/images/Generating_Images_with_Recurrent_Adversarial_Networks__architecture.png?raw=true "Architecture")
* Comparison measure
* They suggest a new method to compare GAN results with each other.
* They suggest to train pairs of G and D, e.g. for two pairs (G1, D1), (G2, D2). Then they let the pairs compete with each other.
* To estimate the quality of D they suggest `r_test = errorRate(D1, testset) / errorRate(D2, testset)`. ("Which D is better at spotting that the test set images are real images?")
* To estimate the quality of the generated samples they suggest `r_sample = errorRate(D1, images by G2) / errorRate(D2, images by G1)`. ("Which G is better at fooling an unknown D, i.e. possibly better at generating life-like images?")
* They suggest to estimate which G is better using r_sample and then to estimate how valid that result is using r_test.
* Results
* Generated images of churches, with timesteps 1 to 5:
* ![Churches](https://raw.githubusercontent.com/aleju/papers/master/neural-nets/images/Generating_Images_with_Recurrent_Adversarial_Networks__churches.jpg?raw=true "Churches")
* Overfitting
* They saw no indication of overfitting in the sense of memorizing images from the training dataset.
* They however saw some indication of G just interpolating between some good images and of G reusing small image patches in different images.
* Randomness of noise vector `z`:
* Sampling the noise vector once seems to be better than resampling it at every timestep.
* Resampling it at every time step often led to very similar looking output images.
more
less