Data Distillation: Towards Omni-Supervised Learning
Girshick, Ross B.
arXiv e-Print archive - 2017 via Local Bibsonomy
* It's a semi-supervised method (the goal is to make use of unlabeled data in addition to labeled data).
* They first train a neural net normally, in the supervised way, on a labeled dataset.
* Then **they retrain the net using *its own predictions* on the originally unlabeled data as if it was ground truth** (but only when the net is confident enough about the prediction).
* More precisely they retrain on the union of the original dataset and the examples labeled by the net itself. (Each minibatch is on average 60% original and 40% self-labeled)
* When making these predictions (that will subsequently used for training), they use **multi-transform inference**.
* They apply the net to differently transformed versions of the image (mirroring, scaling), transform the outputs back accordingly and combine the results.