On Using Monolingual Corpora in Neural Machine Translation
Gülçehre, Çaglar
and
Firat, Orhan
and
Xu, Kelvin
and
Cho, Kyunghyun
and
Barrault, Loïc
and
Lin, Huei-Chi
and
Bougares, Fethi
and
Schwenk, Holger
and
Bengio, Yoshua
arXiv e-Print archive - 2015 via Local Bibsonomy
Keywords:
dblp
The authors extend a seq2seq model for MT with a language model. They first pre-train a seq2seq model and a neural language model, then train a separate feedforward component that takes the hidden states from both and combines them together to make a prediction. They compare to simply combining the output probabilities from both models (shallow fusion) and show improvement on different MT datasets.
https://i.imgur.com/zD9jb4K.png