This paper is about Convolutional Neural Networks for Computer Vision. It was the first break-through in the ImageNet classification challenge (LSVRC-2010, 1000 classes).
ReLU was a key aspect which was not so often used before. The paper also used Dropout in the last two layers.
## Training details
* Momentum of 0.9
* Learning rate of $\varepsilon$ (initialized at 0.01)
* Weight decay of $0.0005 \cdot \varepsilon$.
* Batch size of 128
* The training took 5 to 6 days on two NVIDIA GTX 580 3GB GPUs.
## See also
* [Stanford presentation](http://vision.stanford.edu/teaching/cs231b_spring1415/slides/alexnet_tugce_kyunghee.pdf)