Adversarial Training for Unsupervised Bilingual Lexicon Induction on ShortScience.org

doi.org
sci-hub
scholar.google.com

Adversarial Training for Unsupervised Bilingual Lexicon Induction
Zhang, Meng and Liu, Yang and Luan, Huanbo and Sun, Maosong
Association for Computational Linguistics - 2017 via Local Bibsonomy
Keywords: dblp

Summaries/Notes 1

[link] Summary by Tim Miller 7 years ago

Multilingual embeddings are useful for creating embeddings for low resource languages for things like transfer learning (e.g., learning a POS tagger in a low-resource language using training data from a high resource language). However, they typically require some small amount of supervision in the form of aligned corpora, seed pairs, or dictionaries. This approach attempts to learn a mapping from a source embedding space into a target embedding space without supervision.

The approach uses two networks a la adversarial training. One network (the generator) is parameterized by a projection matrix that attempts to map source words into the target space. The other network (the discriminator) attempts to discriminate true target embeddings from projected source embeddings. Since adversarial training is known to be unstable (a "research frontier" as the authors say), quite a bit of the paper describes tricks and training methods the authors investigated to get training to converge and understand how to select models.

They evaluate on many pairs, including both similar and dissimilar language pairs, and get very nice results. In summary, better than seed-based approaches with 0-100 seeds, competitive with 100-1000 seeds. Much of what would be traditional discussion is instead devoted to details of training regimen, so unfortunately there is little discussion of why this works. Given the difficulty one might encounter attempting to train this, I think it might be a little preliminary to try using this for applications, but continued research in training adversarial networks for NLP and properties of embedding spaces could potentially make this approach reliable enough for real applications.

Your comment: