This paper introduces a neural network architecture
that generates realistic images sequentially. They
also introduce a differentiable attention mechanism
that allows the network to focus on local regions of the image
during reconstruction. Main contributions:
- The network architecture is similar to other variational
auto-encoders, except that
- The encoder and decoder are recurrent networks (LSTMs).
The encoder's output is conditioned on the decoder's
previous outputs, and the decoder's outputs are iteratively
added to the resulting distribution from which images are
- The spatial attention mechanism restricts the input region
observed by the encoder and available to write for the decoder.
- The spatial soft attention mechanism is effective and fully differentiable,
and can be used for other tasks.
- Images generated by DRAW look very realistic.
## Weaknesses / Notes