How transferable are features in deep neural networks? on ShortScience.org

papers.nips.cc
scholar.google.com

How transferable are features in deep neural networks?
Yosinski, Jason and Clune, Jeff and Bengio, Yoshua and Lipson, Hod
Neural Information Processing Systems Conference - 2014 via Local Bibsonomy
Keywords: dblp

Summaries/Notes 1

[link] Summary by Abhishek Das 7 years ago

This paper studies the transferability of features learnt at different layers
of a convolutional neural network. Typically, initial layers of a CNN learn
features that resemble Gabor filter or color blobs, and are fairly general, while
the later layers are more task-specific. Main contributions:

- They create two splits of the ImageNet dataset (A/B) and explore how performance
varies for various network design choices such as
    - Base: CNN trained on A or B.
    - Selffer: first n layers are copied from a base network, and the rest of the
    network is randomly initialized and trained on the same task.
    - Transfer: first n layers are copied from a base network, and the rest of the
    network is trained on a different task.
    - Each of these 'copied' layers can either be fine-tuned or kept frozen.

- Selffer networks without fine-tuning don't perform well when the split is somewhere
in the middle of the network (n = 3-6). This is because neurons in these layers co-adapt
to each other's activations in complex ways, which get broken up when split.
    - As we approach final layers, there is lesser for the network to learn and so these
    layers can be trained independently.
    - Fine-tuning a selffer network gives it the chance to re-learn co-adaptations.

- Transfer networks transferred at lower n perform better than larger n, indicating
that features get more task-specific as we move to higher layers.
    - Fine-tuning transfer networks, however, results in better performance. They argue
    that better generalization is due to the effect of having seen the base dataset,
    even after considerable fine-tuning.

- Fine-tuning works much better than using random features.

- Features are more transferable across related tasks than unrelated tasks.
    - They study transferability by taking two random data splits, and splits of
    man-made v/s natural data.

## Strengths

- Experiments are thorough, and the results are intuitive and insightful.

## Weaknesses / Notes

- This paper only analyzes transferability across different splits of ImageNet
(as similar/dissimilar tasks). They should have reported results on transferability
from one task to another (classification/detection) or from one dataset to another
(ImageNet/MSCOCO).

- It would be interesting to study the role of dropout in preventing co-adaptations
while transferring features.

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private