When Does Contrastive Visual Representation Learning Work?
Elijah Cole
and
Xuan Yang
and
Kimberly Wilber
and
Oisin Mac Aodha
and
Serge Belongie
arXiv e-Print archive - 2021 via Local arXiv
Keywords:
cs.CV, cs.LG
First published: 2024/11/23 (just now) Abstract: Recent self-supervised representation learning techniques have largely closed
the gap between supervised and unsupervised learning on ImageNet
classification. While the particulars of pretraining on ImageNet are now
relatively well understood, the field still lacks widely accepted best
practices for replicating this success on other datasets. As a first step in
this direction, we study contrastive self-supervised learning on four diverse
large-scale datasets. By looking through the lenses of data quantity, data
domain, data quality, and task granularity, we provide new insights into the
necessary conditions for successful self-supervised learning. Our key findings
include observations such as: (i) the benefit of additional pretraining data
beyond 500k images is modest, (ii) adding pretraining images from another
domain does not lead to more general representations, (iii) corrupted
pretraining images have a disparate impact on supervised and self-supervised
pretraining, and (iv) contrastive learning lags far behind supervised learning
on fine-grained visual classification tasks.