First published: 2013/12/16 (10 years ago) Abstract: We propose a novel deep network structure called "Network In Network" (NIN)
to enhance model discriminability for local patches within the receptive field.
The conventional convolutional layer uses linear filters followed by a
nonlinear activation function to scan the input. Instead, we build micro neural
networks with more complex structures to abstract the data within the receptive
field. We instantiate the micro neural network with a multilayer perceptron,
which is a potent function approximator. The feature maps are obtained by
sliding the micro networks over the input in a similar manner as CNN; they are
then fed into the next layer. Deep NIN can be implemented by stacking mutiple
of the above described structure. With enhanced local modeling via the micro
network, we are able to utilize global average pooling over feature maps in the
classification layer, which is easier to interpret and less prone to
overfitting than traditional fully connected layers. We demonstrated the
state-of-the-art classification performances with NIN on CIFAR-10 and
CIFAR-100, and reasonable performances on SVHN and MNIST datasets.
This paper studies a very natural generalization of convolutional layers
by replacing a single filter that slides over the input feature map with
a "micro network" (multi-layer perceptron). The authors argue that good
abstractions are highly non-linear functions of input data and instead of
generating an overcomplete number of feature maps and shrinking them down
in higher layers (as is the case in traditional CNNs), it would be beneficial
to generate better representations on each local patch, before feeding into
the next layer. Main contributions:
- Replaces the convolutional filter with a multi-layer perceptron.
- Instead of fully connected layers, uses global average pooling.
## Strengths
- Natural generalization of convolutional layers and thorough analysis.
- Global average pooling of feature layers is easier to interpret and less prone to overfitting.
- Better or at par with state-of-the-art classification results on CIFAR-10, CIFAR-100, SVHN, MNIST.
## Weaknesses / Notes
- Should have explored NIN without dropout.
- Results on ImageNet missing.
- The global average pooling idea, although interpretable,
doesn't seem to give easily to fine-tuning the network to
other datasets. In finetuning, we usually replace and learn
just the last layer.