Towards cross-lingual distributed representations without parallel text trained with adversarial autoencoders on ShortScience.org

arxiv.org
arxiv-vanity.com
scholar.google.com

Towards cross-lingual distributed representations without parallel text trained with adversarial autoencoders
Antonio Valerio Miceli Barone
arXiv e-Print archive - 2016 via Local arXiv
Keywords: cs.CL, cs.LG, cs.NE
more

Summaries/Notes 1

[link] Summary by Jon Gauthier 7 years ago

This is a simple unsupervised method for learning word-level translation
between embeddings of two different languages.

That's right -- unsupervised.

The basic motivating hypothesis is that there should be an isomorphism between
the "semantic spaces" of different languages:

> we hypothesize that, if languages are used to convey thematically similar information in similar contexts, these random processes should be approximately isomorphic between languages, and that this isomorphism can be learned from the statistics of the realizations of these processes, the monolingual corpora, in principle without any form of explicit alignment.

If you squint a bit, you can make the more aggressive claim from this premise
that there should be a nonlinear / MLP mapping between *word embedding spaces*
that gets us the same result.

The author uses the adversarial autoencoder (AAE, from Makhzani last year)
framework in order to enforce a cross-lingual semantic mapping in word
embedding spaces. The basic setup for adversarial training between a source and
a target language:

1. Sample a batch of words from the source language according to the language's
word frequency distribution.
2. Sample a batch of words from the target language according to its word
frequency distribution. (No sort of relationship is enforced between the two
samples here.)
3. Feed the word embeddings corresponding to the source words through an
*encoder* MLP. This corresponds to the standard "generator" in a GAN setup.
4. Pass the generator output to a *discriminator* MLP along with the
target-language word embeddings.
5. Also pass the generator output to a *decoder* which maps back to the source
embedding distribution.
6. Update weights based on a combination of GAN loss + reconstruction loss.

### Does it work?

We don't really know. The paper is unfortunately short on evaluation --- we
just see a few examples of success and failure on a trained model. One easy
evaluation would be to compare accuracy in lexical mapping vs. corpus frequency
of the source word. I would bet that this would reveal the model hasn't done
much more than learn to align a small set of high-frequency words.

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private