Debiased Contrastive Learning
Ching-Yao Chuang
and
Joshua Robinson
and
Lin Yen-Chen
and
Antonio Torralba
and
Stefanie Jegelka
arXiv e-Print archive - 2020 via Local arXiv
Keywords:
cs.LG, stat.ML
First published: 2024/10/05 (just now) Abstract: A prominent technique for self-supervised representation learning has been to
contrast semantically similar and dissimilar pairs of samples. Without access
to labels, dissimilar (negative) points are typically taken to be randomly
sampled datapoints, implicitly accepting that these points may, in reality,
actually have the same label. Perhaps unsurprisingly, we observe that sampling
negative examples from truly different labels improves performance, in a
synthetic setting where labels are available. Motivated by this observation, we
develop a debiased contrastive objective that corrects for the sampling of
same-label datapoints, even without knowledge of the true labels. Empirically,
the proposed objective consistently outperforms the state-of-the-art for
representation learning in vision, language, and reinforcement learning
benchmarks. Theoretically, we establish generalization bounds for the
downstream classification task.