Towards Multi-Agent Communication-Based Language Learning
Angeliki Lazaridou
and
Nghia The Pham
and
Marco Baroni
arXiv e-Print archive - 2016 via Local arXiv
Keywords:
cs.CL, cs.CV, cs.LG
First published: 2016/05/23 (8 years ago) Abstract: We propose an interactive multimodal framework for language learning. Instead
of being passively exposed to large amounts of natural text, our learners
(implemented as feed-forward neural networks) engage in cooperative referential
games starting from a tabula rasa setup, and thus develop their own language
from the need to communicate in order to succeed at the game. Preliminary
experiments provide promising results, but also suggest that it is important to
ensure that agents trained in this way do not develop an adhoc communication
code only effective for the game they are playing
This article makes the argument for *interactive* language learning, motivated by some nice recent small-domain success. I can certainly agree with the motivation: if language is used in conversation, shouldn't we be building models which know how to behave in conversation?
The authors develop a standard multi-agent communication paradigm, where two agents learn communicate in a single-round reference game. (No references to e.g. Kirby or any [ILM][1] work, which is in the same space.) Agent `A1` examines a referent `R` and "transmits" a one-hot utterance representation to `A2`, who must successfully identify `R` given the utterance. The one-round conversations are a success when `A2` picks the correct referent `R`. The two agents are jointly trained to maximize this success metric via REINFORCE (policy gradient).
**This is mathematically equivalent to [the NVIL model (Mnih and Gregor, 2014)][2]**, an autoencoder with "hard" latent codes which is likewise trained by policy gradient methods.
They perform a nice thorough evaluation on both successful and "cheating" models. This will serve as a useful reference point / starting point for people interested in interactive language acquisition.
The way clear is forward, I think: let's develop agents in more complex environments, interacting in multi-round conversations, with more complex / longer utterances.
[1]: http://cocosci.berkeley.edu/tom/papers/IteratedLearningEvolutionLanguage.pdf
[2]: https://arxiv.org/abs/1402.0030