Idea:
- Fast: simply a linear projection from image feature to semantic tag: learn linear projection **W**. Especially, in testing time, it is almost O(1) complexity compared with the nearest neighbor methods with at least O(N) complexity.
- Enrich incomplete tags: learn tag enrichment projection **B** that turns on likely co-occurring tags with existing ones.
https://i.imgur.com/I1DUInN.png
Marginalized blank-out regularization
- Assume observed tags are corrupted, approximate the unknown corrupting distribution with piecewise uniform distribution.
- Stacking: multi-layer linear projection. Reconstruct tags that do not co-occur together but tend to appear within similar contexts.
- Rare tags and Non-Uniform Corruption: only optimize tags that have recall below some threshold in the validation set.