[link]
Summary by Hugo Larochelle 8 years ago
This paper presents a neural network architecture that can take as input a question and a sequence of facts expressed in natural language (i.e. a sequence of words) and produce its output the answer to that question. The main components of the architecture are as follows:
* The question (q) and the facts (f_1, ... , f_K) are each individually transformed into a fixed size vector using the same GRU RNN (with the last hidden layer serving as the vector representation).
* These vectors are each passed through "reasoning layers", where each layer transforms the question q and the facts f_k into a new vector representation. This is done by feeding each question fact pair (q,f_k) to a neural network that outputs a new representation for the fact f_k (which replaces its old representation in the layer), as well as a new representation for the question. All K new question representations are then pooled to obtain a single question representation that replace the old one in the layer.
* The last reasoning layer is either fed to a softmax layer for binary questions, or to a scoring layer for questions with multiple and varying candidate answers.
This so-called Neural Reasoner can be trained by backpropagation, in an end-to-end, supervised way. The authors also suggest the use of auxiliary tasks, to improve results. The first ("original") adds an autoencoder reconstuction cost, that reproduces the question and facts from its first layer encoding. The second ("abstract") instead reconstructs a more abstract version of the sentences (e.g. "The triangle is above the pink rectangle." becomes "x is above y").
Importantly, while the Neural Reasoner framework is presented in this paper as covering many different variants, the version that is experimentally tested is one where the fact representations f_k are actually left unchanged throughout the reasoning layers, with only the question representation being changed.
The paper presents experiments on two synthetic reasoning tasks and report performances that compare favorably with previously published alternatives (based on the general Memory Network architecture). The experiments also show that the auxiliary tasks can substantially improve the performance of the model
#### My two cents
The proposed Neural Reasoner framework is actually very close to work published on arXiv at about the same time on End-to-End Memory Networks \cite{conf/nips/SukhbaatarSWF15}. In fact, the version tested in the paper, with unchanged fact representations throughout layers, is extremely close to End-to-End Memory Networks.
That said, there are also lots of differences. For instance, this paper proposes the use of multilayer networks within each Reasoning Layer, to produce updated question representations. In fact, experiments suggest that using several layers can be very beneficial for the path finding task. The sentence representation at the first layer is also different, being based on a non-linear RNN instead of being based on linear operations on embeddings as in Memory Networks.
The most interesting aspect of this paper to me is probably the demonstration that the use of an auxiliary task such as "original", which is unsupervised, can substantially improve the performance, again for the path finding task. That is, to me, probably the most exciting direction of future research that this paper highlights as promising.
I also liked how the model is presented. It didn't take me much time to understand the model, and I actually found it easier to absorb than the Memory Network model, despite both being very similar. I think this model is indeed a bit simpler than Memory Networks, which is a good thing. It also suggests a different approach to the problem, one where the facts representations are also updated during forward propagation, not just the question's representation (which is the version initially described in the paper... I hope experiments on that variant are eventually presented).
It's unfortunate that the authors only performed experiments on 2 of the 20 synthetic question-answering tasks. I hope a future version of this work can report results on the full benchmark and directly compare with End-to-End Memory Networks.
I was also unable to find out which of the question representation pooling mechanism (section 3.2.2) was used in the experiments. Perhaps the authors forgot to state it?
Overall, a pretty interesting paper that open different doors towards reasoning with neural networks.
more
less