Representation Learning by Rotating Your Faces
Tran, Luan
and
Yin, Xi
and
Liu, Xiaoming
arXiv e-Print archive - 2017 via Local Bibsonomy
Keywords:
dblp
This paper gets a face image and changes its pose or rotates it (to any desired pose) by passing the target pose as the input to the model.
https://i.imgur.com/AGNOag5.png
They use a GAN (named DR-GAN) for face rotation. The gan has an encoder and a decoder. The encoder takes the image and gets a high-level feature representation. The decoder gets high-level features, the target pose, and some noise to generate the output image with rotated face.
The generated image is then passed to a discriminator where it says whether the image is real or fake. The disc also has two other outputs: 1- it estimates the pose of the generated image, 2) it estimated the identity of the person.
no direct loss is applied to the generator, it is trained by the gradient that it gets through discriminator to minimize the three objects: 1- gan loss (to fool disc) 2-pose estimation 3- identity estimation.
They use two tricks to improve the model:
1- using the same parameters for encoder of generator (gen-enc) and the discriminator (they observe this helps better identity recognition)
2- passing two images to gen-enc and interpolating between their high-level features (gen-enc output) and then applying two costs on it: 1) gan loss 2) pose loss. These losses are applied through disc, similar to above.
The first trick improves gen-enc and second trick improves gen-dec, both help on identification.
Their model can also leverage multiple image of the same identity if the dataset provides that to get better latent representation in gen-enc for a given identity.
https://i.imgur.com/23Tckqc.png
These are some samples on face frontalization:
https://i.imgur.com/zmCODXe.png
and these are some samples on interpolating different features in latent space: (sub-fig a) interpolating f(x) between the latent space of two images, (sub-fig b) interpolating pose (c), (sub-fig c) interpolating noise:
https://i.imgur.com/KlkVyp9.png
I find these positive aspects about the paper:
1) face rotation is applied on the images in the wild, 2) It is not required to have paired data. 3) multiple source images of the same identity can be used if provided, 4) identity and pose are used smartly in the discriminator to guide the generator, 5) model can specify the target pose (it is not only face-frontalization).
Negative aspects:
1) face has many artifacts, similar to artifacts of some other gan models.
2) The identity is not well-preserved and the faces seem sometime distorted compared to the original person.
They show the models performance on identity recognition and face rotation and demonstrate compelling results.