Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
and
Mike Schuster
and
Zhifeng Chen
and
Quoc V. Le
and
Mohammad Norouzi
and
Wolfgang Macherey
and
Maxim Krikun
and
Yuan Cao
and
Qin Gao
and
Klaus Macherey
and
Jeff Klingner
and
Apurva Shah
and
Melvin Johnson
and
Xiaobing Liu
and
Łukasz Kaiser
and
Stephan Gouws
and
Yoshikiyo Kato
and
Taku Kudo
and
Hideto Kazawa
and
Keith Stevens
et al.
arXiv e-Print archive - 2016 via Local arXiv
Keywords:
cs.CL, cs.AI, cs.LG
First published: 2016/09/26 (8 years ago) Abstract: Neural Machine Translation (NMT) is an end-to-end learning approach for
automated translation, with the potential to overcome many of the weaknesses of
conventional phrase-based translation systems. Unfortunately, NMT systems are
known to be computationally expensive both in training and in translation
inference. Also, most NMT systems have difficulty with rare words. These issues
have hindered NMT's use in practical deployments and services, where both
accuracy and speed are essential. In this work, we present GNMT, Google's
Neural Machine Translation system, which attempts to address many of these
issues. Our model consists of a deep LSTM network with 8 encoder and 8 decoder
layers using attention and residual connections. To improve parallelism and
therefore decrease training time, our attention mechanism connects the bottom
layer of the decoder to the top layer of the encoder. To accelerate the final
translation speed, we employ low-precision arithmetic during inference
computations. To improve handling of rare words, we divide words into a limited
set of common sub-word units ("wordpieces") for both input and output. This
method provides a good balance between the flexibility of "character"-delimited
models and the efficiency of "word"-delimited models, naturally handles
translation of rare words, and ultimately improves the overall accuracy of the
system. Our beam search technique employs a length-normalization procedure and
uses a coverage penalty, which encourages generation of an output sentence that
is most likely to cover all the words in the source sentence. On the WMT'14
English-to-French and English-to-German benchmarks, GNMT achieves competitive
results to state-of-the-art. Using a human side-by-side evaluation on a set of
isolated simple sentences, it reduces translation errors by an average of 60%
compared to Google's phrase-based production system.