Systematic Generalization: What Is Required and Can It Be Learned?
Dzmitry Bahdanau
and
Shikhar Murty
and
Michael Noukhovitch
and
Thien Huu Nguyen
and
Harm de Vries
and
Aaron Courville
arXiv e-Print archive - 2018 via Local arXiv
Keywords:
cs.CL, cs.AI
First published: 2018/11/30 (6 years ago) Abstract: Numerous models for grounded language understanding have been recently
proposed, including (i) generic models that can be easily adapted to any given
task and (ii) intuitively appealing modular models that require background
knowledge to be instantiated. We compare both types of models in how much they
lend themselves to a particular form of systematic generalization. Using a
synthetic VQA test, we evaluate which models are capable of reasoning about all
possible object pairs after training on only a small subset of them. Our
findings show that the generalization of modular models is much more systematic
and that it is highly sensitive to the module layout, i.e. to how exactly the
modules are connected. We furthermore investigate if modular models that
generalize well could be made more end-to-end by learning their layout and
parametrization. We find that end-to-end methods from prior work often learn
inappropriate layouts or parametrizations that do not facilitate systematic
generalization. Our results suggest that, in addition to modularity, systematic
generalization in language understanding may require explicit regularizers or
priors.
The paper discusses neural module network trees (NMN-trees). Here modules are composed in a tree structure to answer a question/task and modules are trained in different configurations to ensure they learn more core concepts and can generalize.
Longer summary:
How to perform systematic generalization? First we need to ask how
good current models are at understanding language. Adversarial
examples show how fragile these models can be. This leads us to
conclude that systematic generalization is an issue that requires
specific attention.
Maybe we should rethink the modeling assumptions being made. We can
think that samples can come from different data domains but are
generated by some set of shared rules. If we correctly learned these
rules then domain shift in the test data would not hurt model
performance. Currently we can construct an experiment to introduce
systematic bias in the data which causes the performance to suffer.
From this experiment we can start to determine what the issue is.
A recent new idea is to force a model to have more independent units
is neural module network trees (NMN-trees). Here modules are composed
in a tree structure to answer a question/task and modules are trained
in different configurations to ensure they learn more core concepts
and can generalize.