Attention-over-Attention Neural Networks for Reading Comprehension
Yiming Cui
and
Zhipeng Chen
and
Si Wei
and
Shijin Wang
and
Ting Liu
and
Guoping Hu
arXiv e-Print archive - 2016 via Local arXiv
Keywords:
cs.CL, cs.NE
First published: 2016/07/15 (8 years ago) Abstract: Cloze-style queries are representative problems in reading comprehension.
Over the past few months, we have seen much progress that utilizing neural
network approach to solve Cloze-style questions. In this paper, we present a
novel model called attention-over-attention reader for the Cloze-style reading
comprehension task. Our model aims to place another attention mechanism over
the document-level attention, and induces "attended attention" for final
predictions. Unlike the previous works, our neural network model requires less
pre-defined hyper-parameters and uses an elegant architecture for modeling.
Experimental results show that the proposed attention-over-attention model
significantly outperforms various state-of-the-art systems by a large margin in
public datasets, such as CNN and Children's Book Test datasets.
TLDR; The authors present a novel Attention-over-Attention (AoA) model for Machine Comprehension. Given a document and cloze-style question, the model predicts a single-word answer. The model,
1. Embeds both context and query using a bidirectional GRU
2. Computes a pairwise matching matrix between document and query words
3. Computes query-to-document attention values
4. Computes document-to-que attention averages for each query word
5. Multiplies the two attention vectors to get final attention scores for words in the document
6. Maps attention results back into the vocabulary space
The authors evaluate the model on the CNN News and CBTest Question Answering datasets, obtaining state-of-the-art results and beating other models including EpiReader, ASReader, etc.
#### Notes:
- Very good model visualization in the paper
- I like that this model is much simpler than EpiReader while also performing better