Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks
Jason Weston
and
Antoine Bordes
and
Sumit Chopra
and
Alexander M. Rush
and
Bart van Merriënboer
and
Armand Joulin
and
Tomas Mikolov
arXiv e-Print archive - 2015 via Local arXiv
Keywords:
cs.AI, cs.CL, stat.ML
First published: 2015/02/19 (9 years ago) Abstract: One long-term goal of machine learning research is to produce methods that
are applicable to reasoning and natural language, in particular building an
intelligent dialogue agent. To measure progress towards that goal, we argue for
the usefulness of a set of proxy tasks that evaluate reading comprehension via
question answering. Our tasks measure understanding in several ways: whether a
system is able to answer questions via chaining facts, simple induction,
deduction and many more. The tasks are designed to be prerequisites for any
system that aims to be capable of conversing with a human. We believe many
existing learning systems can currently not solve them, and hence our aim is to
classify these tasks into skill sets, so that researchers can identify (and
then rectify) the failings of their systems. We also extend and improve the
recently introduced Memory Networks model, and show it is able to solve some,
but not all, of the tasks.
#### Introduction
The [paper](http://arxiv.org/pdf/1502.05698v10) presents a framework and a set of synthetic toy tasks (classified into skill sets) for analyzing the performance of different machine learning algorithms.
#### Tasks
* **Single/Two/Three Supporting Facts**: Questions where a single(or multiple) supporting facts provide the answer. More is the number of supporting facts, tougher is the task.
* **Two/Three Supporting Facts**: Requires differentiation between objects and subjects.
* **Yes/No Questions**: True/False questions.
* **Counting/List/Set Questions**: Requires ability to count or list objects having a certain property.
* **Simple Negation and Indefinite Knowledge**: Tests the ability to handle negation constructs and model sentences that describe a possibility and not a certainty.
* **Basic Coreference, Conjunctions, and Compound Coreference**: Requires ability to handle different levels of coreference.
* **Time Reasoning**: Requires understanding the use of time expressions in sentences.
* **Basic Deduction and Induction**: Tests basic deduction and induction via inheritance of properties.
* **Position and Size Reasoning**
* **Path Finding**: Find path between locations.
* **Agent's Motivation**: Why an agent performs an action ie what is the state of the agent.
#### Dataset
* The dataset is available [here](https://research.facebook.com/research/-babi/) and the source code to generate the tasks is available [here](https://github.com/facebook/bAbI-tasks).
* The different tasks are independent of each other.
* For supervised training, the set of relevant statements is provided along with questions and answers.
* The tasks are available in English, Hindi and shuffled English words.
#### Data Simulation
* Simulated world consists of entities of various types (locations, objects, persons etc) and of various actions that operate on these entities.
* These entities have their internal state and follow certain rules as to how they interact with other entities.
* Basic simulations are of the form: <actor> <action> <object> eg Bob go school.
* To add variations, synonyms are used for entities and actions.
#### Experiments
##### Methods
* N-gram classifier baseline
* LSTMs
* Memory Networks (MemNNs)
* Structured SVM incorporating externally labeled data
##### Extensions to Memory Networks
* **Adaptive Memories** - learn the number of hops to be performed instead of using the fixed value of 2 hops.
* **N-grams** - Use a bag of 3-grams instead of a bag-of-words.
* **Nonlinearity** - Apply 2-layer neural network with *tanh* nonlinearity in the matching function.
##### Structured SVM
* Uses coreference resolution and semantic role labeling (SRL) which are themselves trained on a large amount of data.
* First train with strong supervision to find supporting statements and then use a similar SVM to find the response.
##### Results
* Standard MemNN outperform N-gram and LSTM but still fail on a number of tasks.
* MemNNs with Adaptive Memory improve the performance for multiple supporting facts task and basic induction task.
* MemNNs with N-gram modeling improves results when word order matters.
* MemNNs with Nonlinearity performs well on Yes/No tasks and indefinite knowledge tasks.
* Structured SVM outperforms vanilla MemNNs but not as good as MemNNs with modifications.
* Structured SVM performs very well on path finding task due to its non-greedy search approach.