Dynamic Memory Networks for Visual and Textual Question Answering
Caiming Xiong
and
Stephen Merity
and
Richard Socher
arXiv e-Print archive - 2016 via Local arXiv
Keywords:
cs.NE, cs.CL, cs.CV
First published: 2016/03/04 (8 years ago) Abstract: Neural network architectures with memory and attention mechanisms exhibit
certain reasoning capabilities required for question answering. One such
architecture, the dynamic memory network (DMN), obtained high accuracy on a
variety of language tasks. However, it was not shown whether the architecture
achieves strong results for question answering when supporting facts are not
marked during training or whether it could be applied to other modalities such
as images. Based on an analysis of the DMN, we propose several improvements to
its memory and input modules. Together with these changes we introduce a novel
input module for images in order to be able to answer visual questions. Our new
DMN+ model improves the state of the art on both the Visual Question Answering
dataset and the \babi-10k text question-answering dataset without supporting
fact supervision.