Recurrent Neural Machine Translation
Biao Zhang
and
Deyi Xiong
and
Jinsong Su
arXiv e-Print archive - 2016 via Local arXiv
Keywords:
cs.CL
First published: 2016/07/29 (8 years ago) Abstract: The vanilla attention-based neural machine translation has achieved promising
performance because of its capability in leveraging varying-length source
annotations. However, this model still suffers from failures in long sentence
translation, for its incapability in capturing long-term dependencies. In this
paper, we propose a novel recurrent neural machine translation (RNMT), which
not only preserves the ability to model varying-length source annotations but
also better captures long-term dependencies. Instead of the conventional
attention mechanism, RNMT employs a recurrent neural network to extract the
context vector, where the target-side previous hidden state serves as its
initial state, and the source annotations serve as its inputs. We refer to this
new component as contexter. As the encoder, contexter and decoder in our model
are all derivable recurrent neural networks, our model can still be trained
end-to-end on large-scale corpus via stochastic algorithms. Experiments on
Chinese-English translation tasks demonstrate the superiority of our model to
attention-based neural machine translation, especially on long sentences.
Besides, further analysis of the contexter revels that our model can implicitly
reflect the alignment to source sentence.