Towards cross-lingual distributed representations without parallel text trained with adversarial autoencoders
Antonio Valerio Miceli Barone
arXiv e-Print archive - 2016 via Local arXiv
Keywords:
cs.CL, cs.LG, cs.NE
First published: 2016/08/09 (8 years ago) Abstract: Current approaches to learning vector representations of text that are
compatible between different languages usually require some amount of parallel
text, aligned at word, sentence or at least document level. We hypothesize
however, that different natural languages share enough semantic structure that
it should be possible, in principle, to learn compatible vector representations
just by analyzing the monolingual distribution of words.
In order to evaluate this hypothesis, we propose a scheme to map word vectors
trained on a source language to vectors semantically compatible with word
vectors trained on a target language using an adversarial autoencoder.
We present preliminary qualitative results and discuss possible future
developments of this technique, such as applications to cross-lingual sentence
representations.