[link]
Disclaimer: I am the first author. # Executive summary - The authors propose a new method, [*Centroid Networks*](https://arxiv.org/pdf/1902.08605.pdf), for learning to cluster. - Given example clusterings of data, the goal is to learn how to cluster new data following the same criterion. - Centroid Networks basically consist of running K-means on Prototypical Network features, plus many tricks. - They evaluate Centroid Networks on Omniglot and miniImageNet (supervised few-shot classification benchmarks). - Centroid Networks can compete with Prototypical Networks (state of the art in supervised few-shot classification) despite using no supervision at evaluation time (the labels of the support set are completely ignored). ## Pros * **Simple training** (non end-to-end, very similar to prototypical networks, with additional tricks). * **Very fast clustering** (nearly same running time as prototypical networks). * The authors claim that the Sinkhorn K-means formulation is empirically very stable : any initialization is fine as long as symmetries are broken (in practice, they initialize all centroids at 0 and add a small gaussian noise to centroids at each step). ## Cons * Clusters **need to be balanced** now. Removing the balanced constraint is future work. # Setting - They frame learning to cluster as a meta-learning problem, **few-shot clustering**. - The goal is to cluster K*M images into K clusters of M images. - Classes vary across tasks, but class semantics are the same (Omniglot: cluster by character, miniImageNet: cluster by object category). - They also define a second task **unsupervised few-shot classification** solely for comparing with supervised few-shot classification methods. # Method Conceptually, Centroid Networks consist of training *Prototypical Networks* (meta-training), then running *K-means* on top of protonet representations at clustering time (meta-evaluation). However, the authors propose several tricks that significantly improve upon that baseline: - **Center loss**: when pretraining, this extra regularization term penalizes the intra-class variance. - **Sinkhorn assignments**: when pretraining, replace the softmax predictions p(y|x) with a formulation based on optimal transport (Sinkhorn distances). - **Sinkhorn K-means**: at clustering time, run the Sinkhorn K-means algorithm on the learned representation # Results on Few-Shot Classification Benchmarks: - The task is **unsupervised few-shot classification**: cluster a *unlabeled* support set, then predict which clusters new images should be classified into. - Target metric is **unsupervised accuracy**. - *Unsupervised* few-shot classification is harder than *supervised* few-shot classification because *no labels* are given in the support set. - Compare with reference oracle Prototypical networks, which can access labeled support set. - Centroid Networks are almost as good as Protonets on Omniglot (99.1% vs. reference 99.7%) - Centroid Networks are comparable to Protonets on miniImageNet (53.1% vs. reference 66.9%). - The proposed "tricks" are useful because Centroid Networks beats K-Means (Protonet feature) baseline. https://i.imgur.com/acQpQeq.png https://i.imgur.com/FlHf9Ko.png # Results on Learning to Cluster Benchmarks: - The task is **few-shot clustering**. After training on 30 alphabets of Omniglot, the task is to cluster 20 new alphabets (20-47 characters, with 20 instances/character). - Target metric is **clustering accuracy**. - Centroid Networks beat all flavors of Constrained Clustering Networks (86.6% vs. 83.3%) - Centroid Networks are about 100 times than CCN faster but less flexible (fixed cluster sizes). https://i.imgur.com/PvH5V1W.png # Code The code is available at https://github.com/gabrielhuang/centroid-networks
Your comment:
|