Neural Machine Translation by Jointly Learning to Align and Translate on ShortScience.org

arxiv.org
scholar.google.com

Neural Machine Translation by Jointly Learning to Align and Translate
Bahdanau, Dzmitry and Cho, Kyunghyun and Bengio, Yoshua
arXiv e-Print archive - 2014 via Local Bibsonomy
Keywords: dblp

Summaries/Notes 3

[link] Summary by Joseph Paul Cohen 8 years ago

One core aspect of this attention approach is that it provides the ability to debug the learned representation by visualizing the softmax output (later called $\alpha_{ij}$) over the input words for each output word as shown below.

https://i.imgur.com/Kb7bk3e.png

In this approach each unit in the RNN they attend over the previous states, unitwise so the length can vary, and then apply a softmax and use the resulting probabilities to multiply and sum each state. This forms the memory used by each state to make a prediction. This bypasses the need for the network to encode everything in the state passed between units.

Each hidden unit is computed as:

$$s_i = f(s_{i−1}, y_{i−1}, c_i).$$

Where $s_{i−1}$ is the previous state and $y_{i−1}$ is the previous target word. Their contribution is $c_i$. This is the context vector which contains the memory of the input phrase.

$$c_i = \sum_{j=1} \alpha_{ij} h_j$$

Here $\alpha_{ij}$ is the output of a softmax for the $j$th element of the input sequence. $h_j$ is the hidden state at the point the RNN was processing the input sequence.

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private