The paper presents a framework to learn to classify images that can come either from known or unknown classes. This is done by first mapping both images and classes into a joint embedding space. Furthermore, the probability of an image being of an unknown class is estimated using a mixture of Gaussians. Experiments on CIFAR-10 show how performance vary depending on the threshold use to determine if an image is of a known class or not.
The model first tries to detect whether an image contains an object from a so-far unseen category. If not, the model relies on a regular, state-of-the art supervised classifier to assign the image to known classes. Otherwise, it attempts to identify what this object is, based on a comparison between the image and each unseen class, in a learned joint image/class representation space. The method relies on pre-trained word representations, extracted from unlabelled text, to represent the classes. Experiments evaluate the compromise between classification accuracy on the seen classes and the unseen classes, as a threshold for identifying an unseen class is varied.