[link]
Summary by gabriel 5 years ago
Disclaimer: I am the first author.
# Executive summary
- The authors propose a new method, [*Centroid Networks*](https://arxiv.org/pdf/1902.08605.pdf), for learning to cluster.
- Given example clusterings of data, the goal is to learn how to cluster new data following the same criterion.
- Centroid Networks basically consist of running K-means on Prototypical Network features, plus many tricks.
- They evaluate Centroid Networks on Omniglot and miniImageNet (supervised few-shot classification benchmarks).
- Centroid Networks can compete with Prototypical Networks (state of the art in supervised few-shot classification) despite using no supervision at evaluation time (the labels of the support set are completely ignored).
## Pros
* **Simple training** (non end-to-end, very similar to prototypical networks, with additional tricks).
* **Very fast clustering** (nearly same running time as prototypical networks).
* The authors claim that the Sinkhorn K-means formulation is empirically very stable : any initialization is fine as long as symmetries are broken (in practice, they initialize all centroids at 0 and add a small gaussian noise to centroids at each step).
## Cons
* Clusters **need to be balanced** now. Removing the balanced constraint is future work.
# Setting
- They frame learning to cluster as a meta-learning problem, **few-shot clustering**.
- The goal is to cluster K*M images into K clusters of M images.
- Classes vary across tasks, but class semantics are the same (Omniglot: cluster by character, miniImageNet: cluster by object category).
- They also define a second task **unsupervised few-shot classification** solely for comparing with supervised few-shot classification methods.
# Method
Conceptually, Centroid Networks consist of training *Prototypical Networks* (meta-training), then running *K-means* on top of protonet representations at clustering time (meta-evaluation). However, the authors propose several tricks that significantly improve upon that baseline:
- **Center loss**: when pretraining, this extra regularization term penalizes the intra-class variance.
- **Sinkhorn assignments**: when pretraining, replace the softmax predictions p(y|x) with a formulation based on optimal transport (Sinkhorn distances).
- **Sinkhorn K-means**: at clustering time, run the Sinkhorn K-means algorithm on the learned representation
# Results on Few-Shot Classification Benchmarks:
- The task is **unsupervised few-shot classification**: cluster a *unlabeled* support set, then predict which clusters new images should be classified into.
- Target metric is **unsupervised accuracy**.
- *Unsupervised* few-shot classification is harder than *supervised* few-shot classification because *no labels* are given in the support set.
- Compare with reference oracle Prototypical networks, which can access labeled support set.
- Centroid Networks are almost as good as Protonets on Omniglot (99.1% vs. reference 99.7%)
- Centroid Networks are comparable to Protonets on miniImageNet (53.1% vs. reference 66.9%).
- The proposed "tricks" are useful because Centroid Networks beats K-Means (Protonet feature) baseline.
https://i.imgur.com/acQpQeq.png
https://i.imgur.com/FlHf9Ko.png
# Results on Learning to Cluster Benchmarks:
- The task is **few-shot clustering**. After training on 30 alphabets of Omniglot, the task is to cluster 20 new alphabets (20-47 characters, with 20 instances/character).
- Target metric is **clustering accuracy**.
- Centroid Networks beat all flavors of Constrained Clustering Networks (86.6% vs. 83.3%)
- Centroid Networks are about 100 times than CCN faster but less flexible (fixed cluster sizes).
https://i.imgur.com/PvH5V1W.png
# Code
The code is available at https://github.com/gabrielhuang/centroid-networks
more
less