Addressing the Rare Word Problem in Neural Machine Translation on ShortScience.org

aclweb.org
scholar.google.com

Addressing the Rare Word Problem in Neural Machine Translation
Luong, Thang and Sutskever, Ilya and Le, Quoc V. and Vinyals, Oriol and Zaremba, Wojciech
Association for Computational Linguistics - 2015 via Local Bibsonomy
Keywords: dblp

Summaries/Notes 1

[link] Summary by Shagun Sodhani 7 years ago

# Addressing the Rare Word Problem in Neural Machine Translation

## Introduction

* NMT(Neural Machine Translation) systems perform poorly with respect to OOV(out-of-vocabulary) words or rare words.
* The paper presents a word-alignment based technique for translating such rare words.
* [Link to the paper](https://arxiv.org/abs/1410.8206)

## Technique

* Annotate the training corpus with information about what do different OOV words (in the target sentence) correspond to in the source sentence.
* NMT learns to track the alignment of rare words across source and target sentences and emits such alignments for the test sentences.
* As a post-processing step, use a dictionary to map rare words from the source language to target language.

## Annotating the Corpus

### Copy Model

* Annotate the OOV words in the source sentence with tokens *unk1*, *unk2*,..., etc such that repeated words get the same token.
* In target language, each OOV word, that is aligned to some OOV word in the source language, is assigned the same token as the word in the source language.
* The OOV word in the target language, which has no alignment or is aligned with a known word in the source language. is assigned the null token.
* Pros
* Very straightforward
* Cons
* Misses out on words which are not labelled as OOV in the source language.

### PosAll - Positional All Model

* All OOV words in the source language are assigned a single *unk* token.
* All words in the target sentences are assigned positional tokens which denote that the *jth* word in the target sentence is aligned to the *ith* word in the source sentence.
* Aligned words that are too far apart, or are unaligned, are assigned a null token.
* Pros
* Captures complete alignment between source and target sentences.
* Cons
* It doubles the length of target sentences.

### PosUnk - Positional Unknown Model

* All OOV words in the source language are assigned a single *unk* token.
* All OOV words in the target sentences are assigned *unk* token with the position which gives the relative position of the word in the target language with respect to its aligned source word.
* Pros:
* Faster than PosAll model.
* Cons
* Does not capture alignment for all words.

## Experiments

* Dataset
* Subset of WMT'14 dataset
* Alignment computed using the [Berkeley Aligner](https://code.google.com/archive/p/berkeleyaligner/)
* Used architecture from [Sequence to Sequence Learning with Neural Networks paper](https://gist.github.com/shagunsodhani/a2915921d7d0ac5cfd0e379025acfb9f).

## Results

* All the 3 approaches (more specifically the PosUnk approach) improve the performance of existing NMTs in the order PosUnk > PosAll > Copy.
* Ensemble models benefit more than individual models as the ensemble of NMT models works better at aligning the OOV words.
* Performance gains are more when using smaller vocabulary.
* Rare word analysis shows that performance gains are more when proposition of OOV words is higher.

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private