WikiReading: A Novel Large-scale Language Understanding Task over Wikipedia
Daniel Hewlett
and
Alexandre Lacoste
and
Llion Jones
and
Illia Polosukhin
and
Andrew Fandrianto
and
Jay Han
and
Matthew Kelcey
and
David Berthelot
arXiv e-Print archive - 2016 via Local arXiv
Keywords:
cs.CL
First published: 2016/08/11 (8 years ago) Abstract: We present WikiReading, a large-scale natural language understanding task and
publicly-available dataset with 18 million instances. The task is to predict
textual values from the structured knowledge base Wikidata by reading the text
of the corresponding Wikipedia articles. The task contains a rich variety of
challenging classification and extraction sub-tasks, making it well-suited for
end-to-end models such as deep neural networks (DNNs). We compare various
state-of-the-art DNN-based architectures for document classification,
information extraction, and question answering. We find that models supporting
a rich answer space, such as word or character sequences, perform best. Our
best-performing model, a word-level sequence to sequence model with a mechanism
to copy out-of-vocabulary words, obtains an accuracy of 71.8%.