Incorporating Copying Mechanism in Sequence-to-Sequence Learning
Jiatao Gu
and
Zhengdong Lu
and
Hang Li
and
Victor O. K. Li
arXiv e-Print archive - 2016 via Local arXiv
Keywords:
cs.CL, cs.AI, cs.LG, cs.NE
First published: 2016/03/21 (8 years ago) Abstract: We address an important problem in sequence-to-sequence (Seq2Seq) learning
referred to as copying, in which certain segments in the input sequence are
selectively replicated in the output sequence. A similar phenomenon is
observable in human language communication. For example, humans tend to repeat
entity names or even long phrases in conversation. The challenge with regard to
copying in Seq2Seq is that new machinery is needed to decide when to perform
the operation. In this paper, we incorporate copying into neural network-based
Seq2Seq learning and propose a new model called CopyNet with encoder-decoder
structure. CopyNet can nicely integrate the regular way of word generation in
the decoder with the new copying mechanism which can choose sub-sequences in
the input sequence and put them at proper places in the output sequence. Our
empirical study on both synthetic data sets and real world data sets
demonstrates the efficacy of CopyNet. For example, CopyNet can outperform
regular RNN-based model with remarkable margins on text summarization tasks.
TLDR; The authors introduce CopyNet, a variation on the seq2seq that incorporates a "copying mechanism". With this mechanism, the effective vocabulary is the union of the standard vocab and the words in the current source sentence. CopyNet predicts words based on a mixed probability of the standard attention mechanism and a new copy mechanism. The authors show empirically that on toy and summarization tasks CopNet behaves as expected: The decoder is dominated by copy mode when it tries to replicate something from the source.