* The paper explains how to apply dropout to LSTMs and how it could reduce overfitting in tasks like language modelling, speech recognition, image caption generation and machine translation.
* [Link to the paper](https://arxiv.org/abs/1409.2329)
* Regularisation method that drops out (or temporarily removes) units in a neural network.
the network, along with all its incoming and outgoing connections
* Conventional dropout does not work well with RNNs as the recurrence amplifies the noise and hurts learning.
* The paper proposes to apply dropout to only the non-recurrent connections.
* The dropout operator would corrupt information carried by some units (and not all) forcing them to perform intermediate computations more robustly.
* The information is corrupted L+1 times where L is the number of layers and is independent of timestamps traversed by the information.
* In the context of language modelling, image caption generation, speech recognition and machine translation, dropout enables training larger networks and reduces the testing error in terms of perplexity and frame accuracy.