First published: 2018/10/31 (6 years ago) Abstract: In NMT, how far can we get without attention and without separate encoding
and decoding? To answer that question, we introduce a recurrent neural
translation model that does not use attention and does not have a separate
encoder and decoder. Our eager translation model is low-latency, writing target
tokens as soon as it reads the first source token, and uses constant memory
during decoding. It performs on par with the standard attention-based model of
Bahdanau et al. (2014), and better on long sentences.