The authors propose a new deep architecture, which combines the hierarchy of deep learning with time-series modeling known from HMMs or recurrent neural networks. The proposed training algorithm builds the network layer-by-layer using supervised (pre-)training a next-letter prediction objective. The experiments demonstrate that after training very large networks for about 10 days, the network performance on a Wikipedia dataset published by Hinton et al. improves over previous work. The authors then proceed to analyze and discuss details of how the network approaches its task. For example, long-term dependencies are modeled in higher layers, correspondence between opening and closing parenthesis are modeled as a “pseudo-stable attractor-like state”.