Language Modeling with Gated Convolutional Networks on ShortScience.org

proceedings.mlr.press
scholar.google.com

Language Modeling with Gated Convolutional Networks
Dauphin, Yann N. and Fan, Angela and Auli, Michael and Grangier, David
International Conference on Machine Learning - 2017 via Bibsonomy
Keywords: dblp

Summaries/Notes 1

[link] Summary by Martin Thoma 7 years ago

This paper is about a new model for language which uses a convolutional approach instead of LSTMs.


## General Language modeling

Statistical language models estimate the probability distribution of a sequence of words. They are important for ASR (automatic speech recognition) and translation. The usual approach is to embedd words into $\mathbb{R}^n$ and then apply RNNs to the vector sequences.


## Evaluation

* [WikiText-103](http://metamind.io/research/the-wikitext-long-term-dependency-language-modeling-dataset/): [Perplexity](https://en.wikipedia.org/wiki/Perplexity) of 44.9 (lower is better)
* new best single-GPU result on the Google Billion Word benchmark: Perplexity of 43.9


## Idea

* uses Gated Linear Units (GLU)
* uses pre-activation residual blocks
* adaptive softmax
* no tanh in the gating mechanism
* use gradient clipping

## See also

* [Reddit](https://www.reddit.com/r/MachineLearning/comments/5kbsjb/r_161208083_language_modeling_with_gated/)
* [Improving Neural Language Models with a Continuous Cache](https://arxiv.org/abs/1612.04426): Test perplexity of **40.8 on WikiText-103**

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private