Early Stopping without a Validation Set on ShortScience.org

arxiv.org
arxiv-vanity.com
scholar.google.com

Early Stopping without a Validation Set
Maren Mahsereci and Lukas Balles and Christoph Lassner and Philipp Hennig
arXiv e-Print archive - 2017 via Local arXiv
Keywords: cs.LG, stat.ML
more

Summaries/Notes 1

[link] Summary by Martin Thoma 8 years ago

Summary from [reddit](https://www.reddit.com/r/MachineLearning/comments/623oq4/r_early_stopping_without_a_validation_set/dfjzwqq/):

We want to minimize the expected risk (loss) but that's a mean over the real distribution of the data, which we don't know. We approximate that by using a finite dataset and try to minimize the empirical risk instead.
The gradients for the empirical risk are an approximation to the gradients for the expected risk.
The idea is that the real gradients contain just information whereas the approximated gradients contain information + noise. The noise results from using a finite dataset to approximate the real distribution of the data.
By computing local statistics about the gradients, the authors are able to determine when the gradients have no information about the expected risk anymore and what's left is just noise. If we keep optimizing we're going to overfit.

Your comment: