First published: 2016/06/09 (8 years ago) Abstract: Sequence-to-Sequence (seq2seq) modeling has rapidly become an important
general-purpose NLP tool that has proven effective for many text-generation and
sequence-labeling tasks. Seq2seq builds on deep neural language modeling and
inherits its remarkable accuracy in estimating local, next-word distributions.
In this work, we introduce a model and beam-search training scheme, based on
the work of Daume III and Marcu (2005), that extends seq2seq to learn global
sequence scores. This structured approach avoids classical biases associated
with local training and unifies the training loss with the test-time usage,
while preserving the proven model architecture of seq2seq and its efficient
training approach. We show that our system outperforms a highly-optimized
attention-based seq2seq system and other baselines on three different sequence
to sequence tasks: word ordering, parsing, and machine translation.
This paper is covered by author in this [talk](https://github.com/udibr/notes/blob/master/Talk%20by%20Sasha%20Rush%20-%20Interpreting%2C%20Training%2C%20and%20Distilling%20Seq2Seq%E2%80%A6.pdf)