Neural Machine Translation with Recurrent Attention Modeling on ShortScience.org

arxiv.org
arxiv-vanity.com
scholar.google.com

Neural Machine Translation with Recurrent Attention Modeling
Zichao Yang and Zhiting Hu and Yuntian Deng and Chris Dyer and Alex Smola
arXiv e-Print archive - 2016 via Local arXiv
Keywords: cs.NE, cs.CL
more

Summaries/Notes 1

[link] Summary by Denny Britz 8 years ago

TLDR; The standard attention model does not take into account the "history" of attention activations, even though this should be a good predictor of what to attend to next. The authors augment a seq2seq network with a dynamic memory that, for each input, keep track of an attention matrix over time. The model is evaluated on English-German and Englih-Chinese NMT tasks and beats competing models.

#### Notes

- How expensive is this, and how much more difficult are these networks to train?
- Sequentiallly attending to neighboring words makes sense for some language pairs, but for others it doesn't. This method seems rather restricted because it only takes into account a window of k time steps.

Your comment: