The paper 'Big Neural Networks Waste Capacity' recognizes that adding more layer / parameters does not improve accuracy. When reading this paper, one should bear in mind that it was written well before [Deep Residual Learning for Image Recognition](http://www.shortscience.org/paper?bibtexKey=journals/corr/HeZRS15) or DenseNets.
In the experiments, they applied MLPs to SIFT features of ImageNet LSVRC-2010.
**Do not read this paper**. Instead, you might want to read the "Deep Residual Learning for Image Recognition". It makes the same point, but clearer and offers a solution to the underfitting problem.
I don't understand why they write about k-means.
> Assuming minimal error in the human labelling of the dataset, it should be possible to reach errors close to 0%.
For ImageNet, the human labeling error is estimated at about 5% (I can't find the source for that, though)
> Improvements on ImageNet are thought to be a good proxy for progress in object recognition (Deng et al., 2009).
ImageNet images are very different from "typical web images" like the [100 million images Flickr dataset](http://yahoolabs.tumblr.com/post/89783581601/one-hundred-million-creative-commons-flickr-images-for).