Neural Generation of Regular Expressions from Natural Language with Minimal Domain Knowledge on ShortScience.org

arxiv.org
arxiv-vanity.com
scholar.google.com

Neural Generation of Regular Expressions from Natural Language with Minimal Domain Knowledge
Nicholas Locascio and Karthik Narasimhan and Eduardo DeLeon and Nate Kushman and Regina Barzilay
arXiv e-Print archive - 2016 via Local arXiv
Keywords: cs.CL, cs.AI
more

Summaries/Notes 1

[link] Summary by Shagun Sodhani 8 years ago

#### Introduction

* Task of translating natural language queries into regular expressions without using domain specific knowledge.
* Proposes a methodology for collecting a large corpus of regular expressions to natural language pairs.
* Reports performance gain of 19.6% over state-of-the-art models.
* [Link to the paper](http://arxiv.org/abs/1608.03000v1)

#### Architecture

* LSTM based sequence to sequence neural network (with attention)
* Six layers
    * One-word embedding layer
    * Two encoder layers
    * Two decoder layers
    * One dense output layer.
* Attention over encoder layer.
* Dropout with the probability of 0.25.
* 20 epochs, minibatch size of 32 and learning rate of 1 (with decay rate of 0.5)

#### Dataset Generation

* Created a public dataset - **NL-RX** - with 10K pair of (regular expression, natural language) 
* Two step generate-and-paraphrase approach
    * Generate step
        * Use handcrafted grammar to translate regular expressions to natural language.
    * Paraphrase step
        * Crowdsourcing the task of translating the rigid descriptions into more natural expressions.


#### Results

* Evaluation Metric
    * Functional equality check (called DFA-Equal) as same regular expression could be written in many ways.
* Proposed architecture outperforms both the baselines - Nearest Neighbor classifier using Bag of Words (BoWNN) and Semantic-Unify

Your comment: