BEGAN: Boundary Equilibrium Generative Adversarial Networks
David Berthelot
and
Thomas Schumm
and
Luke Metz
arXiv e-Print archive - 2017 via Local arXiv
Keywords:
cs.LG, stat.ML
First published: 2017/03/31 (7 years ago) Abstract: We propose a new equilibrium enforcing method paired with a loss derived from
the Wasserstein distance for training auto-encoder based Generative Adversarial
Networks. This method balances the generator and discriminator during training.
Additionally, it provides a new approximate convergence measure, fast and
stable training and high visual quality. We also derive a way of controlling
the trade-off between image diversity and visual quality. We focus on the image
generation task, setting a new milestone in visual quality, even at higher
resolutions. This is achieved while using a relatively simple model
architecture and a standard training procedure.
_Objective:_ Improve GANs convergence to more diverse and visually pleasing images at higher resolution using a novel equilibrium method between the discriminator and the generator that also simplifies training procedures.
_Dataset:_ [LFW](http://vis-www.cs.umass.edu/lfw/)
## Inner workings:
They try to match the distribution of the errors (assumed to be normally distributed) instead of matching the distribution of the samples directly. In order to do this they compute the Wasserstein distance between a pixel-wise autoencoder loss distributions of real and generated samples defined as follow:
1. Autoencoder loss:
[![screen shot 2017-04-24 at 3 46 32 pm](https://cloud.githubusercontent.com/assets/17261080/25340190/429f9788-2905-11e7-88dc-b44567b9cd34.png)](https://cloud.githubusercontent.com/assets/17261080/25340190/429f9788-2905-11e7-88dc-b44567b9cd34.png)
2. Wasserstein distance for two normal distributions μ1 = N(m1, C1) and μ2 = N(m2, C2)
[![screen shot 2017-04-24 at 3 46 44 pm](https://cloud.githubusercontent.com/assets/17261080/25340191/42b23474-2905-11e7-9810-58d5326bf886.png)](https://cloud.githubusercontent.com/assets/17261080/25340191/42b23474-2905-11e7-9810-58d5326bf886.png)
They also introduce an equilibrium concept to account for the situation when `G` and `D` are not well balanced and the discriminator `D` wins easily. This is controlled by what they call the diversity ratio that balances between auto-encoding real images and discriminating real from generated images. It is defined as follow:
[![screen shot 2017-04-24 at 3 56 29 pm](https://cloud.githubusercontent.com/assets/17261080/25340609/992c2188-2906-11e7-8c51-498bbd293119.png)](https://cloud.githubusercontent.com/assets/17261080/25340609/992c2188-2906-11e7-8c51-498bbd293119.png)
To maintain this balance they use a standard SGD but they introduce a variable `kt` initially 0 to control how much emphasis is put on the generator `G`. This removes the need to do `x` steps on `D` followed by `y` steps on `G` or to pretrained one of the two.
[![screen shot 2017-04-24 at 3 59 57 pm](https://cloud.githubusercontent.com/assets/17261080/25340859/4ee06476-2907-11e7-971f-90421449cb51.png)](https://cloud.githubusercontent.com/assets/17261080/25340859/4ee06476-2907-11e7-971f-90421449cb51.png)
Finally they derive a global convergence measure by using the equilibrium concept that can be used to determine when the network has reached its final state or if the model has collapsed:
[![screen shot 2017-04-24 at 4 04 12 pm](https://cloud.githubusercontent.com/assets/17261080/25340998/b8bf6ad6-2907-11e7-8afa-294cae32c6af.png)](https://cloud.githubusercontent.com/assets/17261080/25340998/b8bf6ad6-2907-11e7-8afa-294cae32c6af.png)
## Architecture:
They tried to keep the architecture simple to really study the impact of their new equilibrium principle and loss. They don't use batch normalization, dropout, transpose convolutions or exponential growth for convolution filters.
[![screen shot 2017-04-24 at 4 09 29 pm](https://cloud.githubusercontent.com/assets/17261080/25341219/6fb7be28-2908-11e7-8774-287c1b7d7684.png)](https://cloud.githubusercontent.com/assets/17261080/25341219/6fb7be28-2908-11e7-8774-287c1b7d7684.png)
## Results:
They trained on images from 32x32 to 256x256, but at higher resolution images tend to lose sharpness. Nevertheless images are very very good!
[![screen shot 2017-04-24 at 4 20 30 pm](https://cloud.githubusercontent.com/assets/17261080/25341699/f99b0770-2909-11e7-84a0-3ac0436771e5.png)](https://cloud.githubusercontent.com/assets/17261080/25341699/f99b0770-2909-11e7-84a0-3ac0436771e5.png)