First published: 2021/11/28 (just now) Abstract: Communication requires having a common language, a lingua franca, between
agents. This language could emerge via a consensus process, but it may require
many generations of trial and error. Alternatively, the lingua franca can be
given by the environment, where agents ground their language in representations
of the observed world. We demonstrate a simple way to ground language in
learned representations, which facilitates decentralized multi-agent
communication and coordination. We find that a standard representation learning
algorithm -- autoencoding -- is sufficient for arriving at a grounded common
language. When agents broadcast these representations, they learn to understand
and respond to each other's utterances and achieve surprisingly strong task
performance across a variety of multi-agent communication environments.
In certain classes of multi-agent cooperation games, it's useful for agents to be able to coordinate on future actions, which is an obvious use case for having a communication channel between the two players. However, prior work in multi-agent RL has shown that it's surprisingly hard to train agents that (1) consistently learn to use a communication channel in a way that is informative rather than random, and (2) if they do use communication, can come to a common grounding on the meaning of symbols, to use them in an effective way.
This paper suggests the straightforward and clever approach of, instead of just having agents communicate using arbitrary vectors produced as part of a policy, having those communication vectors be directly linked to the content of an agent's observations. Specifically, this is done by taking the encoding of the image that is used for making policy decisions, and passing that encoding through an autoencoder, using the bottleneck at the middle of the autoencoder as the communication vector sent to other agents. This structure incentivizes the agent to generate communication vectors that are intrinsically grounded in the observation, enforcing a certain level of consistency that the authors hope makes it easier for the other agent to follow and interpret the communication.
Empirically, there seem to be fairly compelling evidence that this autoencoder-based form of grounding is more stable and thus more mutually learnable than learning from RL alone. The authors even found that adding RL training to the autoencoder-based training deteriorated performance.