Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models on ShortScience.org

arxiv.org
scholar.google.com

Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models
Luong, Minh-Thang and Manning, Christopher D.
arXiv e-Print archive - 2016 via Local Bibsonomy
Keywords: dblp

Summaries/Notes 2

[link] Summary by Denny Britz 9 years ago

TLDR; The authors train a word-level NMT where UNK tokens in both source and target sentence are replaced by character-level RNNs that produce word representations. The authors can thus train a fast word-based system that still generalized that doesn't produce unknown words. The best system achieves a new state of the art BLEU score of 19.9 in WMT'15 English to Czech translation.


#### Key Points

- Source Sentence: Final hidden state of character-RNN is used as word representation.
- Source Sentence: Character RNNs always initialized with 0 state to allow efficient pre-training 
- Target: Produce word-level sentence including UNK first and then run the char-RNNs
- Target: Two ways to initialize char-RNN: With same hidden state as word-RNN (same-path), or with its own representation (separate-path)
- Authors find that attention mechanism is critical for pure character-based NMT models


#### Notes

- Given that the authors demonstrate the potential of character-based models, is the hybrid approach the right direction? If we had more compute power, would pure character-based models win?

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private