Word Translation Without Parallel Data
Alexis Conneau
and
Guillaume Lample
and
Marc'Aurelio Ranzato
and
Ludovic Denoyer
and
Hervé Jégou
arXiv e-Print archive - 2017 via Local arXiv
Keywords:
cs.CL
First published: 2017/10/11 (7 years ago) Abstract: State-of-the-art methods for learning cross-lingual word embeddings have
relied on bilingual dictionaries or parallel corpora. Recent works showed that
the need for parallel data supervision can be alleviated with character-level
information. While these methods showed encouraging results, they are not on
par with their supervised counterparts and are limited to pairs of languages
sharing a common alphabet. In this work, we show that we can build a bilingual
dictionary between two languages without using any parallel corpora, by
aligning monolingual word embedding spaces in an unsupervised way. Without
using any character information, our model even outperforms existing supervised
methods on cross-lingual tasks for some language pairs. Our experiments
demonstrate that our method works very well also for distant language pairs,
like English-Russian or English-Chinese. We finally show that our method is a
first step towards fully unsupervised machine translation and describe
experiments on the English-Esperanto language pair, on which there only exists
a limited amount of parallel data.
Inducing word translations using only monolingual corpora for two languages. Separate embeddings are trained for each language and a mapping is learned though an adversarial objective, along with an orthogonality constraint on the most frequent words. A strategy for an unsupervised stopping criterion is also proposed.
https://i.imgur.com/HmME09P.png