Banach Wasserstein GAN on ShortScience.org

arxiv.org
arxiv-vanity.com
scholar.google.com

Banach Wasserstein GAN
Jonas Adler and Sebastian Lunz
arXiv e-Print archive - 2018 via Local arXiv
Keywords: cs.CV, cs.LG, math.FA
more

Summaries/Notes 1

[link] Summary by Artëm Sobolev 6 years ago

The paper extends the [WGAN](http://www.shortscience.org/paper?bibtexKey=journals/corr/1701.07875) paper by replacing the L2 norm in the transportation cost by some other metric $d(x, y)$. By following the same reasoning as in the WGAN paper one arrives at a dual optimization problem similar to the WGAN's one except that the critic $f$ has to be 1-Lipschitz w.r.t. a given norm (rather than L2). This, in turn, means that critic's gradient (w.r.t. input $x$) has to be bounded in the dual norm (only in Banach spaces, hence the name). Authors build upon the [WGAN-GP](http://www.shortscience.org/paper?bibtexKey=journals/corr/1704.00028) to incorporate similar gradient penalty term to force critic's constraint.

In particular authors choose [Sobolev norm](https://en.wikipedia.org/wiki/Sobolev_space#Multidimensional_case):
$$
||f||_{W^{s,p}} = \left( \int \sum_{k=0}^s ||\nabla^k f(x)||_{L_p}^p dx \right)^{1 / p}
$$

This norm is chosen because it not only forces pixel values to be close, but also the gradients to be close as well. The gradients are small when you have smooth texture, and big on the edges -- so this loss can regulate how much you care about the edges. Alternatively, you could express the same norm by first transforming the $f$ using the Fourier Transform, then multiplying the result by $1 + ||x||_{L_2}^2$ pointwise, and then transforming it back and integrating over the whole space:
$$
||f||_{W^{s,p}} = \left( \int \left( \mathcal{F}^{-1} \left[ (1 + ||x||_{L_2}^2)^{s/2} \mathcal{F}[f] (x) \right] (x) \right)^p dx \right)^{1 / p}
$$

Here $f(x)$ would be image pixels intensities, and $x$ would be image coordinates, so $\nabla^k f(x)$ would be spatial gradient -- the one you don't have access to, and it's a bit hard to estimate one with finite differences, so the authors go for the second -- fourier -- option. Luckily, a DFT transform is just a linear operator, and fast implementations exists, so you can backpropagate through it (TensorFlow already includes tf.spectal)

Authors perform experiments on CIFAR and report state-of-the-art non-progressive results in terms of Inception Score (though not beating SNGANs by a statistically significant margin). The samples they present, however, are too small to tell if the network really cared about the edges.

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private