A Deep Reinforced Model for Abstractive Summarization
Romain Paulus
and
Caiming Xiong
and
Richard Socher
arXiv e-Print archive - 2017 via Local arXiv
Keywords:
cs.CL
First published: 2017/05/11 (7 years ago) Abstract: Attentional, RNN-based encoder-decoder models for abstractive summarization
have achieved good performance on short input and output sequences. However,
for longer documents and summaries, these models often include repetitive and
incoherent phrases. We introduce a neural network model with intra-attention
and a new training method. This method combines standard supervised word
prediction and reinforcement learning (RL). Models trained only with the former
often exhibit "exposure bias" -- they assume ground truth is provided at each
step during training. However, when standard word prediction is combined with
the global sequence prediction training of RL the resulting summaries become
more readable. We evaluate this model on the CNN/Daily Mail and New York Times
datasets. Our model obtains a 41.16 ROUGE-1 score on the CNN/Daily Mail
dataset, a 5.7 absolute points improvement over previous state-of-the-art
models. It also performs well as the first abstractive model on the New York
Times corpus. Human evaluation also shows that our model produces higher
quality summaries.