Distilling the Knowledge in a Neural Network
Hinton, Geoffrey E.
Vinyals, Oriol
Dean, Jeffrey
arXiv e-Print archive - 2015 via Local Bibsonomy
#### Problem addressed:
Traditional classifiers are trained using hard targets. This not only calls for learning a very complex function (due to spikes) but also ignores the relative similarity between classes, e.g., truck is more probable to be misclassified as a car instead of a cat. Instead the classifier is forced to assign both the car and cat to a single target value. This leads to poor generalization. This paper addresses this problem.
#### Summary:
In order to address the aforementioned problems, the paper proposes a method to generate soft labels for each sample by first training a cubersome/large/complex classifier like dropout at a high ""temperature"" in so that it generates soft probabilities for every sample which represents its membership to each class. It then trains a vanilla NN initially at a high temperature and then at a low one using the generated soft labels on either the same training data or a transfer data. By doing so the simpler (student) model performs similar to the complex (teacher) model.
#### Novelty:
technique for generating soft labels for classes for training a much simpler classifier compared to currently used large and complex methods like dropout/conv-nets.
#### Drawbacks:
I believe a major drawback of this paper is that it entails learning a complex classifier for generating soft labels. Another drawback is that it is incapable of using unlabeled data.
#### Datasets:
MNIST, JFT (internal google image dataset)
#### Additional remarks:
#### Resources:
#### Presenter:
Devansh Arpit