Language in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification
arXiv e-Print archive - 2022 via Local arXiv
First published: 2024/02/27 (just now) Abstract: Concept Bottleneck Models (CBM) are inherently interpretable models that
factor model decisions into human-readable concepts. They allow people to
easily understand why a model is failing, a critical feature for high-stakes
applications. CBMs require manually specified concepts and often under-perform
their black box counterparts, preventing their broad adoption. We address these
shortcomings and are first to show how to construct high-performance CBMs
without manual specification of similar accuracy to black box models. Our
approach, Language Guided Bottlenecks (LaBo), leverages a language model,
GPT-3, to define a large space of possible bottlenecks. Given a problem domain,
LaBo uses GPT-3 to produce factual sentences about categories to form candidate
concepts. LaBo efficiently searches possible bottlenecks through a novel
submodular utility that promotes the selection of discriminative and diverse
information. Ultimately, GPT-3's sentential concepts can be aligned to images
using CLIP, to form a bottleneck layer. Experiments demonstrate that LaBo is a
highly effective prior for concepts important to visual recognition. In the
evaluation with 11 diverse datasets, LaBo bottlenecks excel at few-shot
classification: they are 11.7% more accurate than black box linear probes at 1
shot and comparable with more data. Overall, LaBo demonstrates that inherently
interpretable models can be widely applied at similar, or better, performance
than black box approaches.
what is the paper doing?
This paper proposed a way to explain the model decision by human-readable concepts. For example, if the model thinks the following image is a black-throated sparrow, then a human can understand this decision via input descriptors.
The descriptors were obtained from GPT-3, they got 500 descriptors for each class and then remove the class name in each descriptor. Then, for each class, they chose $k$ concepts to make sure that every class has an equal amount of concepts.
After that, they put these concepts into a concept selection module to select a more fine-grained subset of concepts for each class.
Then, they put these concepts and the image into CLIP to learn the score for each concept.
Finally, they put a Class-concept weight matrix on top of CLIP to fine-tune these scores and output the predicted class name. Note that, this weight matrix was initialized with language priors.