[link]
Summary by Shagun Sodhani 9 years ago
#### Introduction
* Build a supervised reading comprehension data set using news corpus.
* Compare the performance of neural models and state-of-the-art natural language processing model on reading comprehension task.
* [Link to the paper](http://arxiv.org/abs/1506.03340v3)
#### Reading Comprehension
* Estimate conditional probability $p(a|c, q)$, where $c$ is a context document, $q$ is a query related to the document, and $a$ is the answer to that query.
#### Dataset Generation
* Use online newspapers (CNN and DailyMail) and their matching summaries.
* Parse summaries and bullet points into Cloze style questions.
* Generate corpus of document-query-answer triplets by replacing one entity at a time with a placeholder.
* Data anonymized and randomised using coreference systems, abstract entity markers and random permutation of the entity markers.
* The processed data set is more focused in terms of evaluating reading comprehension as models can not exploit co-occurrence.
#### Models
##### Baseline Models
* **Majority Baseline**
* Picks the most frequently observed entity in the context document.
* **Exclusive Majority**
* Picks the most frequently observed entity in the context document which is not observed in the query.
##### Symbolic Matching Models
* **Frame-Semantic Parsing**
* Parse the sentence to find predicates to answer questions like "who did what to whom".
* Extracting entity-predicate triples $(e1,V, e2)$ from query $q$ and context document $d$
* Resolve queries using rules like `exact match`, `matching entity` etc.
* **Word Distance Benchmark**
* Align placeholder of Cloze form questions with each possible entity in the context document and calculate the distance between the question and the context around the aligned entity.
* Sum the distance of every word in $q$ to their nearest aligned word in $d$
##### Neural Network Models
* **Deep LSTM Reader**
* Test the ability of Deep LSTM encoders to handle significantly longer sequences.
* Feed the document query pair as a single large document, one word at a time.
* Use Deep LSTM cell with skip connections from input to hidden layers and hidden layer to output.
* **Attentive Reader**
* Employ attention model to overcome the bottleneck of fixed width hidden vector.
* Encode the document and the query using separate bidirectional single layer LSTM.
* Query encoding is obtained by concatenating the final forward and backwards outputs.
* Document encoding is obtained by a weighted sum of output vectors (obtained by concatenating the forward and backwards outputs).
* The weights can be interpreted as the degree to which the network attends to a particular token in the document.
* Model completed by defining a non-linear combination of document and query embedding.
* **Impatient Reader**
* As an add-on to the attentive reader, the model can re-read the document as each query token is read.
* Model accumulates the information from the document as each query token is seen and finally outputs a joint document query representation in the form of a non-linear combination of document embedding and query embedding.
#### Result
* Attentive and Impatient Readers outperform all other models highlighting the benefits of attention modelling.
* Frame-Semantic pipeline does not scale to cases where several methods are needed to answer a query.
* Moreover, they provide poor coverage as a lot of relations do not adhere to the default predicate-argument structure.
* Word Distance approach outperformed the Frame-Semantic approach as there was significant lexical overlap between the query and the document.
* The paper also includes heat maps over the context documents to visualise the attention mechanism.

more
less