Fader Networks: Manipulating Images by Sliding Attributes on ShortScience.org

arxiv.org
scholar.google.com

Fader Networks: Manipulating Images by Sliding Attributes
Lample, Guillaume and Zeghidour, Neil and Usunier, Nicolas and Bordes, Antoine and Denoyer, Ludovic and Ranzato, Marc'Aurelio
arXiv e-Print archive - 2017 via Local Bibsonomy
Keywords: dblp

Summaries/Notes 1

[link] Summary by Sina Honari 7 years ago

This paper aims at changing the attributes of a face, without manipulating other aspects of the image, such as add/remove glasses, make a person young/old, changed the gender, and hence the name Fader Networks, similar to sliders of audio mixing tools that can change a value linearly to increase/decrease a feature.

The model is shown below:

https://i.imgur.com/fntPmNu.png

An image $x$ is passed to the encoder and the output of the encoder $E(x)$ is passed to the
discriminator to distinguish whether a feature $y$ is in the latent space or not. The encoded features $E(x)$ and the feature $y$ is passed to the decoder to reconstruct the image $D(E(x))$.

The AE therefore has two loss: 1- The reconstruction loss between $x$ and $D(E(x))$, and 2- The gan loss to fool the discriminator on the feature $y$ in the encoded space $E(x)$.

The discriminator tries to distinguish whether a feature $y$ is in the encoded space $E(x)$ or not, while the encoder tries to fool the discriminator. This process leads to removal of the feature $y$ from the $E(x)$ by encoder. The encoded feature $E(x)$ therefore does not have any information on $y$. However, since the decoder needs to reconstruct the same input image, $E(x)$ has to maintain all information, except the feature $y$ and the decoder should get the feature $y$ from the input of the decoder.

The model is trained on binary $y$ features such as:
male/female, young/old, glasses Yes/No, mouth open Yes/No, eyes open Yes/No (some samples from test set below):
https://i.imgur.com/bj9wu6B.png

At test time, they can change the features continuously and show transition in the features:
https://i.imgur.com/XUD3ZTu.png

The performance of the model is measured using mechanical turks on two metrics: Naturalness of the images and the accuracy of swapping features on the image. In both FadNet shows better results compared to IcGAN, and FadNet shows very good results on accuracy, however on naturalness the performance drops when some features are swapped.

On Flowers dataset, FadNet can change colors of the flowers:
https://i.imgur.com/7nvBSEY.png

I find the following positive aspects about FadNet:

1- It can change some features while maintaining other features of the image such as identity of the person, background information, etc.

2- The model does not need paired data. In some cases it is impossible to gather paired data (e.g. male/female) or very difficult (young/old).

3- The gan loss is used to remove a feature in the latent space, where that feature can be later specified for reconstruction by decoder. Since GAN is applied to latent space, it can be used to remove features on the data that is discrete (where direct usage of disc on those data is not trivial).

I think these aspects need further work for improvement:
- When multiple features are changed the blurriness of the image shows up:
https://i.imgur.com/LD5cVbg.png
When only one feature changes the blurriness affect is much less, despite the fact that they use L2-loss for AE reconstruction. I guess also using a high resolution of 256*256 helps make the bluriness of the images less noticeable.

- The model should be first trained only on AE (no gan loss) and then the gan loss in AE is linearly increased to remove a feature. So, it requires a bit of care in training it properly.

Overall, I find it an interesting paper on how to change a feature in an image when one wants to keep other features unchanged.

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private