First published: 2017/02/15 (7 years ago) Abstract: We consider the general problem of modeling temporal data with long-range
dependencies, wherein new observations are fully or partially predictable based
on temporally-distant, past observations. A sufficiently powerful temporal
model should separate predictable elements of the sequence from unpredictable
elements, express uncertainty about those unpredictable elements, and rapidly
identify novel elements that may help to predict the future. To create such
models, we introduce Generative Temporal Models augmented with external memory
systems. They are developed within the variational inference framework, which
provides both a practical training methodology and methods to gain insight into
the models' operation. We show, on a range of problems with sparse, long-term
temporal dependencies, that these models store information from early in a
sequence, and reuse this stored information efficiently. This allows them to
perform substantially better than existing models based on well-known recurrent
neural networks, like LSTMs.
#### Very Brief Summary:
This paper combines stochastic variational inference with memory-augmented recurrent neural networks. The authors test 4 variants of their models against the Variational Recurrent Neural Network on 7 artificial tasks requiring long term memory. The reported log-likelihood lower bound is not obviously improved by the new models on all tasks but is slightly better on tasks requiring high capacity memory.
#### Slightly Less Brief Summary:
The authors propose a general class of generative models for time-series data with both deterministic and stochastic latents. The deterministic latents, $h_t$, evolve as a recurrent net with augmented memory and the stochastic latents, $z_t$ are gaussians whose mean and variance are a deterministic function of $h_t$. The observations at each time-step $x_t$ are also gaussians whose mean and variance are parametrised by a function of $h_{<t}, x_{<t}$.
#### Generative Temporal Models without Augmented Memory:
The family of generative temporal models is fairly broad and includes kalman filters, non-linear dynamical systems, hidden-markov models and switching state-space models. More recent non-linear models such as the variational RNN are most similar to the new models in this paper. In general all of the mentioned temporal models can be written as:
$P_\theta(x_{\leq T}, z_{\leq T} ) = \prod_t P_\theta(x_t | f_x(z_{\leq t}, x_{\leq t}))P_\theta(z_t | f_z(z_{\leq t}, x_{\leq t}))$
The differences between models then come from the the exact forms of $f_x$ and $f_z$ with most models making strong conditional independence assumptions and/or having linear dependence. For example in a Gaussian State Space model both $f_x$ and $f_z$ are linear, the latents form a first order Markov chain and the observations $x_t$ are conditionally independent of everything given $z_t$.
In the Variational Recurrent Neural Net (VRNN) an additional deterministic latent variable $h_t$ is introduced and at each time-step $x_t$ is the output of a VAE whose prior $z_t$ is conditioned on $h_t$. $h_t$ evolves as an RNN.
#### Types of Model with Augmented Memory:
This paper follows the same strategy as the VRNN but adds more structure to the underlying recurrent neural net. The authors motivate this by saying that the VRNN "scales poorly when higher capacity storage is required".
* "Introspective" Model: In the first augmented memory model, the deterministic latent M_t is simply a concatenation of the last $L$ latent stochastic variables $z_t$. A soft method of attention over the latent memory is used to generate a "memory context" vector at each time step. The observed output $x_t$ is a gaussian with mean and variance parameterised by the "memory context' and the stochastic latent $z_t$. Because this model does not learn to write to memory it is faster to train.
* In the later models the memory read and write operations are the same as those in the neural turing machine or differentiable neural computer.
#### My Two Cents:
In some senses this paper feels fairly inevitable since VAE's have already been married with RNNs and so it's a small leap to add augmented memory.
The actual read write operations introduced in the "introspective" model feel a little hacky and unprincipled.
The actual images generated are quite impressive.
I'd like to see how these kind of models do on language generation tasks and wether they can be adapted for question answering.