Distilling the Knowledge in a Neural Network on ShortScience.org

arxiv.org
scholar.google.com

Distilling the Knowledge in a Neural Network
Hinton, Geoffrey E. and Vinyals, Oriol and Dean, Jeffrey
arXiv e-Print archive - 2015 via Local Bibsonomy
Keywords: dblp

Summaries/Notes 2

[link] Summary by Cubs Reading Group 7 years ago

#### Problem addressed:
Traditional classifiers are trained using hard targets. This not only calls for learning a very complex function (due to spikes) but also ignores the relative similarity between classes, e.g., truck is more probable to be misclassified as a car instead of a cat. Instead the classifier is forced to assign both the car and cat to a single target value. This leads to poor generalization. This paper addresses this problem.

#### Summary:
In order to address the aforementioned problems, the paper proposes a method to generate soft labels for each sample by first training a cubersome/large/complex classifier like dropout at a high ""temperature"" in so that it generates soft probabilities for every sample which represents its membership to each class. It then trains a vanilla NN initially at a high temperature and then at a low one using the generated soft labels on either the same training data or a transfer data. By doing so the simpler (student) model performs similar to the complex (teacher) model.

#### Novelty:
technique for generating soft labels for classes for training a much simpler classifier compared to currently used large and complex methods like dropout/conv-nets.

#### Drawbacks:
I believe a major drawback of this paper is that it entails learning a complex classifier for generating soft labels. Another drawback is that it is incapable of using unlabeled data.

#### Datasets:
MNIST, JFT (internal google image dataset)

#### Additional remarks:

#### Resources:
https://www.youtube.com/watch?v=7kAlBa7yhDM

#### Presenter:
Devansh Arpit

Your comment: