Deep Networks with Stochastic Depth
Huang, Gao
and
Sun, Yu
and
Liu, Zhuang
and
Sedra, Daniel
and
Weinberger, Kilian
arXiv e-Print archive - 2016 via Local Bibsonomy
Keywords:
deeplearning, acreuser
**Dropout for layers** sums it up pretty well. The authors built on the idea of [deep residual networks](http://arxiv.org/abs/1512.03385) to use identity functions to skip layers.
The main advantages:
* Training speed-ups by about 25%
* Huge networks without overfitting
## Evaluation
* [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html): 4.91% error ([SotA](https://martin-thoma.com/sota/#image-classification): 2.72 %) Training Time: ~15h
* [CIFAR-100](https://www.cs.toronto.edu/~kriz/cifar.html): 24.58% ([SotA](https://martin-thoma.com/sota/#image-classification): 17.18 %) Training time: < 16h
* [SVHN](http://ufldl.stanford.edu/housenumbers/): 1.75% ([SotA](https://martin-thoma.com/sota/#image-classification): 1.59 %) - trained for 50 epochs, begging with a LR of 0.1, divided by 10 after 30 epochs and 35. Training time: < 26h