A simple neural network module for relational reasoning
Adam Santoro
and
David Raposo
and
David G. T. Barrett
and
Mateusz Malinowski
and
Razvan Pascanu
and
Peter Battaglia
and
Timothy Lillicrap
arXiv e-Print archive - 2017 via Local arXiv
Keywords:
cs.CL, cs.LG
First published: 2017/06/05 (7 years ago) Abstract: Relational reasoning is a central component of generally intelligent
behavior, but has proven difficult for neural networks to learn. In this paper
we describe how to use Relation Networks (RNs) as a simple plug-and-play module
to solve problems that fundamentally hinge on relational reasoning. We tested
RN-augmented networks on three tasks: visual question answering using a
challenging dataset called CLEVR, on which we achieve state-of-the-art,
super-human performance; text-based question answering using the bAbI suite of
tasks; and complex reasoning about dynamic physical systems. Then, using a
curated dataset called Sort-of-CLEVR we show that powerful convolutional
networks do not have a general capacity to solve relational questions, but can
gain this capacity when augmented with RNs. Our work shows how a deep learning
architecture equipped with an RN module can implicitly discover and learn to
reason about entities and their relations.
The paper proposes a reusable neural network module to `reason about the relations between entities and their properties`:
$$ RN(O) = f_\phi \left( \sum_{i,j} g_\theta(o_i, o_j) \right), $$
- $O$ is a set of input objects $\{o_1, o_2, ..., o_n\}, o_i \in R^m$
- $g_\theta$ is a neural network (MLP) which approximates object-to-object relation function
- $f_\phi $ is a neural network (MLP) which transforms summed pairwise object-to-object relations to some desired output
RN's operate on sets (due to summation in the formula) and thus are invariant to the order of objects in the input.
In terms of architecture, RN module is used at the tail of a neural network taking input objects in form of CNN or LSTM embeddings.
This work is evaluated on several tasks where it achieves reasonably good (even superhuman) performance:
- CLEVR and Sort-of-CLEVR - question answering about an image
- bAbI - text based question answering
- Dynamic physical system - MuJoCo simulations with physical relation between entities