Latent Predictor Networks for Code Generation
Ling, Wang
Grefenstette, Edward
Hermann, Karl Moritz
Kociský, Tomás
Senior, Andrew
Wang, Fumin
Blunsom, Phil
arXiv e-Print archive - 2016 via Local Bibsonomy
This paper presents a conditional generative model of text, where text can be generated either one character at a time or by copying some full chunks of character taken directly from the input into the output. At each step of the generation, the model can decide which of these two modes of generation to use, mixing them as needed to generate a correct output. They refer to this structure for generation as Latent Predictor Networks \cite{conf/nips/VinyalsFJ15}. The character-level generation part of the model is based on a simple output softmax over characters, while the generation-by-copy component is based on a Pointer Network architecture. Critically, the authors highlight that it is possible to marginalize over the use of either types of components by dynamic programming as used in semi-Markov models \cite{conf/nips/SarawagiC04}.
One motivating application is machine translation, where the input might contain some named entities that should just be directly copied at the output. However, the authors experiment on a different problem, that of generating code that would implement the action of a card in the trading card games Magic the Gathering and Hearthstone. In this application, copying is useful to do things such as copy the name of the card or its numerically-valued effects.
In addition to the Latent Predictor Network structure, the proposed model for this application includes a slightly adapted form of soft-attention as well as character-aware word embeddings as in \cite{conf/emnlp/LingDBTFAML15} Also, the authors experiment with a compression procedure on the target programs, that can help in reducing the size of the output space.
Experiments show that the proposed neural network approach outperforms a variety of strong baselines (including systems based on machine translation or information retrieval).
TLDR; The authors demonstrate how to condition on several predictors when generating text/code. For example, one may need to copy inputs or perform database lookups to produce good results, but training multiple predictors end-to-end is challenging. The authors propose Latent Predictor Networks that combine attention-based character generation with pointer networks to copy tokens from the input. The authors evaluate their model on the task of producing code for Trading Card Games like Magic and Hearthstone, where the card image is the input, and the code implementation of a card is the output. Latent Predictor Networks clearly beat seq2seq and attention-based baselines.