S$^\mathbf{4}$L: Self-Supervised Semi-Supervised Learning
Xiaohua Zhai
and
Avital Oliver
and
Alexander Kolesnikov
and
Lucas Beyer
arXiv e-Print archive - 2019 via Local arXiv
Keywords:
cs.CV, cs.LG
First published: 2019/05/09 (5 years ago) Abstract: This work tackles the problem of semi-supervised learning of image
classifiers. Our main insight is that the field of semi-supervised learning can
benefit from the quickly advancing field of self-supervised visual
representation learning. Unifying these two approaches, we propose the
framework of self-supervised semi-supervised learning ($S^4L$) and use it to
derive two novel semi-supervised image classification methods. We demonstrate
the effectiveness of these methods in comparison to both carefully tuned
baselines, and existing semi-supervised learning methods. We then show that
$S^4L$ and existing semi-supervised methods can be jointly trained, yielding a
new state-of-the-art result on semi-supervised ILSVRC-2012 with 10% of labels.
It’s possible I’m missing something here, but my primary response to reading this paper is just a sense of confusion: that there is an implicit presenting of an approach as novel, when there doesn’t seem to me to be a clear mechanism that is changed from prior work. The premise of this paper is that self-supervised learning techniques (a subcategory of unsupervised learning, where losses are constructed based on reconstruction or perturbation of the original image) should be made into supervised learning by learning on a loss that is a weighted combination of the self-supervised loss and the supervised loss, making the overall method a semi-supervised one.
The self-supervision techniques that they identify integrating into their semi-supervised framework are:
- Rotation prediction, where an image is rotated to one of four rotation angles, and then a classifier is applied to guess what angle
- Exemplar representation invariance, where an imagenet is cropped, mirrored, and color-randomized in order to provide inputs, whose representations are then pushed to be closer to the representation for the unmodified image
My confusion is due to the fact that the I know that I’ve read several semi-supervised learning papers that do things of this ilk (insofar as combining unsupervised and supervised losses together) and it seems strange to identify it as a novel contribution. That said, this paper does give an interesting overview of self-supervisation techniques, I found it valuable to read for that purpose.