Conditional Image Generation with PixelCNN Decoders
Aaron van den Oord
and
Nal Kalchbrenner
and
Oriol Vinyals
and
Lasse Espeholt
and
Alex Graves
and
Koray Kavukcuoglu
arXiv e-Print archive - 2016 via Local arXiv
Keywords:
cs.CV, cs.LG
First published: 2016/06/16 (8 years ago) Abstract: This work explores conditional image generation with a new image density
model based on the PixelCNN architecture. The model can be conditioned on any
vector, including descriptive labels or tags, or latent embeddings created by
other networks. When conditioned on class labels from the ImageNet database,
the model is able to generate diverse, realistic scenes representing distinct
animals, objects, landscapes and structures. When conditioned on an embedding
produced by a convolutional network given a single image of an unseen face, it
generates a variety of new portraits of the same person with different facial
expressions, poses and lighting conditions. We also show that conditional
PixelCNN can serve as a powerful decoder in an image autoencoder. Additionally,
the gated convolutional layers in the proposed model improve the log-likelihood
of PixelCNN to match the state-of-the-art performance of PixelRNN on ImageNet,
with greatly reduced computational cost.