First published: 2016/06/07 (8 years ago)
Abstract: The paper systematically studies the impact of a range of recent advances in
CNN architectures and learning methods on the object categorization (ILSVRC)
problem. The evalution tests the influence of the following choices of the
architecture: non-linearity (ReLU, ELU, maxout, compatibility with batch
normalization), pooling variants (stochastic, max, average, mixed), network
width, classifier design (convolutional, fully-connected, SPP), image
pre-processing, and of learning parameters: learning rate, batch size,
cleanliness of the data, etc.
The performance gains of the proposed modifications are first tested
individually and then in combination. The sum of individual gains is bigger
than the observed improvement when all modifications are introduced, but the
"deficit" is small suggesting independence of their benefits. We show that the
use of 128x128 pixel images is sufficient to make qualitative conclusions about
optimal network structure that hold for the full size Caffe and VGG nets. The
results are obtained an order of magnitude faster than with the standard 224
pixel images.