On the State of the Art of Evaluation in Neural Language Models
Gábor Melis
and
Chris Dyer
and
Phil Blunsom
arXiv e-Print archive - 2017 via Local arXiv
Keywords:
cs.CL
First published: 2017/07/18 (7 years ago) Abstract: Ongoing innovations in recurrent neural network architectures have provided a
steady influx of apparently state-of-the-art results on language modelling
benchmarks. However, these have been evaluated using differing code bases and
limited computational resources, which represent uncontrolled sources of
experimental variation. We reevaluate several popular architectures and
regularisation methods with large-scale automatic black-box hyperparameter
tuning and arrive at the somewhat surprising conclusion that standard LSTM
architectures, when properly regularised, outperform more recent models. We
establish a new state of the art on the Penn Treebank and Wikitext-2 corpora,
as well as strong baselines on the Hutter Prize dataset.
Comparison of three recurrent architectures for language modelling: LSTMs, Recurrent Highway Networks and the NAS architecture. Each model goes through a substantial hyperparameter search, under the constraint that the total number of parameters is kept constant. They conclude that basic LSTMs still outperform other architectures and achieve state-of-the-art perplexities on two datasets.