First published: 2016/07/12 (6 years ago) Abstract: Many sequential processing tasks require complex nonlinear transition
functions from one step to the next. However, recurrent neural networks with
such 'deep' transition functions remain difficult to train, even when using
Long Short-Term Memory networks. We introduce a novel theoretical analysis of
recurrent networks based on Ger\v{s}gorin's circle theorem that illuminates
several modeling and optimization issues and improves our understanding of the
LSTM cell. Based on this analysis we propose Recurrent Highway Networks (RHN),
which are long not only in time but also in space, generalizing LSTMs to larger
step-to-step depths. Experiments indicate that the proposed architecture
results in complex but efficient models, beating previous models for character
prediction on the Hutter Prize dataset with less than half of the parameters.

multi layer RNN in which first layer is LSTM, following layers $l$ have $t$,$c$ gates that control whether the state of the layer is carried from previous state or transferred previous layer:
$s_l^{[t]} = h_l^{[t]} \cdot t_l^{[t]} + s_{l-1}^{[t]} \cdot c
_l^{[t]}$