Big Neural Networks Waste Capacity on ShortScience.org

arxiv.org
scholar.google.com

Big Neural Networks Waste Capacity
Dauphin, Yann and Bengio, Yoshua
arXiv e-Print archive - 2013 via Local Bibsonomy
Keywords: dblp

Summaries/Notes 2

[link] Summary by Martin Thoma 8 years ago

The paper 'Big Neural Networks Waste Capacity' recognizes that adding more layer / parameters does not improve accuracy. When reading this paper, one should bear in mind that it was written well before [Deep Residual Learning for Image Recognition](http://www.shortscience.org/paper?bibtexKey=journals/corr/HeZRS15) or DenseNets.

In the experiments, they applied MLPs to SIFT features of ImageNet LSVRC-2010.

**Do not read this paper**. Instead, you might want to read the "Deep Residual Learning for Image Recognition". It makes the same point, but clearer and offers a solution to the underfitting problem.


## Criticism

I don't understand why they write about k-means.

> Assuming minimal error in the human labelling of the dataset, it should be possible to reach errors close to 0%.

For ImageNet, the human labeling error is estimated at about 5% (I can't find the source for that, though)


> Improvements on ImageNet are thought to be a good proxy for progress in object recognition (Deng et al., 2009).

ImageNet images are very different from "typical web images" like the [100 million images Flickr dataset](http://yahoolabs.tumblr.com/post/89783581601/one-hundred-million-creative-commons-flickr-images-for).

Your comment:

[link] Summary by Open Review 9 years ago

This papers show the effects of under-fitting in a neural network as the size of a single neural network layer increases. The overall model is composed of SIFT extraction, k-mean, and this single hidden layer neural network. The paper suggest that this under-fitting problem is due to optimization problems with stochastic gradient descent.

Your comment: