Emergent Translation in Multi-Agent Communication
Jason Lee
and
Kyunghyun Cho
and
Jason Weston
and
Douwe Kiela
arXiv e-Print archive - 2017 via Local arXiv
Keywords:
cs.CL, cs.AI
First published: 2017/10/12 (6 years ago) Abstract: While most machine translation systems to date are trained on large parallel
corpora, humans learn language in a different way: by being grounded in an
environment and interacting with other humans. In this work, we propose a
communication game where two agents, native speakers of their own respective
languages, jointly learn to solve a visual referential task. We find that the
ability to understand and translate a foreign language emerges as a means to
achieve shared goals. The emergent translation is interactive and multimodal,
and crucially does not require parallel corpora, but only monolingual,
independent text and corresponding images. Our proposed translation model
achieves this by grounding the source and target languages into a shared visual
modality, and outperforms several baselines on both word-level and
sentence-level translation tasks. Furthermore, we show that agents in a
multilingual community learn to translate better and faster than in a bilingual
communication setting.
Learning to translate using two monolingual image captioning datasets and pivoting through images. The model encodes an image and generates a caption in language A, this is then encoded into the same space as language B and the representation is optimised to be similar to the correct image. The model is trained end-to-end using Gumbel-softmax.
https://i.imgur.com/lnIsFNb.png