The paper 'Big Neural Networks Waste Capacity' recognizes that adding more layer / parameters does not improve accuracy. When reading this paper, one should bear in mind that it was written well before [Deep Residual Learning for Image Recognition](http://www.shortscience.org/paper?bibtexKey=journals/corr/HeZRS15) or DenseNets.
In the experiments, they applied MLPs to SIFT features of ImageNet LSVRC-2010.
**Do not read this paper**. Instead, you might want to read the "Deep Residual Learning for Image Recognition". It makes the same point, but clearer and offers a solution to the underfitting problem.
I don't understand why they write about k-means.
> Assuming minimal error in the human labelling of the dataset, it should be possible to reach errors close to 0%.
For ImageNet, the human labeling error is estimated at about 5% (I can't find the source for that, though)
> Improvements on ImageNet are thought to be a good proxy for progress in object recognition (Deng et al., 2009).
ImageNet images are very different from "typical web images" like the [100 million images Flickr dataset](http://yahoolabs.tumblr.com/post/89783581601/one-hundred-million-creative-commons-flickr-images-for).
This papers show the effects of under-fitting in a neural network as the size of a single neural network layer increases. The overall model is composed of SIFT extraction, k-mean, and this single hidden layer neural network. The paper suggest that this under-fitting problem is due to optimization problems with stochastic gradient descent.