Systematic evaluation of CNN advances on the ImageNet
Dmytro Mishkin
and
Nikolay Sergievskiy
and
Jiri Matas
arXiv e-Print archive - 2016 via Local arXiv
Keywords:
cs.NE, cs.CV, cs.LG
First published: 2016/06/07 (8 years ago) Abstract: The paper systematically studies the impact of a range of recent advances in
CNN architectures and learning methods on the object categorization (ILSVRC)
problem. The evalution tests the influence of the following choices of the
architecture: non-linearity (ReLU, ELU, maxout, compatibility with batch
normalization), pooling variants (stochastic, max, average, mixed), network
width, classifier design (convolutional, fully-connected, SPP), image
pre-processing, and of learning parameters: learning rate, batch size,
cleanliness of the data, etc.
The performance gains of the proposed modifications are first tested
individually and then in combination. The sum of individual gains is bigger
than the observed improvement when all modifications are introduced, but the
"deficit" is small suggesting independence of their benefits. We show that the
use of 128x128 pixel images is sufficient to make qualitative conclusions about
optimal network structure that hold for the full size Caffe and VGG nets. The
results are obtained an order of magnitude faster than with the standard 224
pixel images.
Authors test different variant of CNN architectures, non-linearities, poolings, etc. on ImageNet.
Summary:
- use ELU non-linearity without batchnorm or ReLU with it.
- apply a learned colorspace transformation of RGB (2 layers of 1x1 convolution ).
- use the linear learning rate decay policy.
- use a sum of the average and max pooling layers.
- use mini-batch size around 128 or 256. If this is too big for your GPU,
decrease the learning rate proportionally to the batch size.
- use fully-connected layers as convolutional and average the predictions for
the final decision.
- when investing in increasing training set size, check if a plateau has not
been reach.
- cleanliness of the data is more important then the size.
- if you cannot increase the input image size, reduce the stride in the consequent
layers, it has roughly the same effect.
- if your network has a complex and highly optimized architecture, like e.g.
GoogLeNet, be careful with modifications.