The authors propose to replace the notion of 'attention' in neural architectures with the notion of 'active memory' where rather than focusing on a single part of the memory one would operate on the whole of it in parallel.
This paper introduces an extension to neural GPUs for machine translation. I found the experimental analysis section lacking in both comparisons to state of the art MT techniques as well as thoroughly evaluating the proposed method.
This paper proposes active memory, which is a memory mechanism that operates all the part in parallel. The active memory was compared to attention mechanism and it is shown that the active memory is more effective for long sentence translation than the attention mechanism in English-French translation.
This paper proposes two new models for modeling sequential data in the sequence-to-sequence framework. The first is called the Markovian Neural GPU and the second is called the Extended Neural GPU. Both models are extensions of the Neural GPU model (Kaiser and Sutskever, 2016), but unlike the Neural GPU, the proposed models do not model the outputs independently but instead connect the output token distributions recursively. The paper provides empirical evidence on a machine translation task showing that the two proposed models perform better than the Neural GPU model and that the Extended Neural GPU performs on par with a GRU-based encoder-decoder model with attention.