Deep Networks with Stochastic Depth on ShortScience.org

arxiv.org
scholar.google.com

Deep Networks with Stochastic Depth
Huang, Gao and Sun, Yu and Liu, Zhuang and Sedra, Daniel and Weinberger, Kilian
arXiv e-Print archive - 2016 via Local Bibsonomy
Keywords: deeplearning, acreuser

Summaries/Notes 5

[link] Summary by Martin Thoma 9 years ago

**Dropout for layers** sums it up pretty well. The authors built on the idea of [deep residual networks](http://arxiv.org/abs/1512.03385) to use identity functions to skip layers. 

The main advantages:

* Training speed-ups by about 25%
* Huge networks without overfitting

## Evaluation

* [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html): 4.91% error ([SotA](https://martin-thoma.com/sota/#image-classification): 2.72 %) Training Time: ~15h
* [CIFAR-100](https://www.cs.toronto.edu/~kriz/cifar.html): 24.58% ([SotA](https://martin-thoma.com/sota/#image-classification): 17.18 %) Training time: < 16h
* [SVHN](http://ufldl.stanford.edu/housenumbers/):  1.75% ([SotA](https://martin-thoma.com/sota/#image-classification): 1.59 %) - trained for 50 epochs, begging with a LR of 0.1, divided by 10 after 30 epochs and 35. Training time: < 26h

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private