First published: 2015/12/07 (5 years ago) Abstract: Image representations, from SIFT and bag of visual words to Convolutional
Neural Networks (CNNs) are a crucial component of almost all computer vision
systems. However, our understanding of them remains limited. In this paper we
study several landmark representations, both shallow and deep, by a number of
complementary visualization techniques. These visualizations are based on the
concept of "natural pre-image", namely a natural-looking image whose
representation has some notable property. We study in particular three such
visualizations: inversion, in which the aim is to reconstruct an image from its
representation, activation maximization, in which we search for patterns that
maximally stimulate a representation component, and caricaturization, in which
the visual patterns that a representation detects in an image are exaggerated.
We pose these as a regularized energy-minimization framework and demonstrate
its generality and effectiveness. In particular, we show that this method can
invert representations such as HOG more accurately than recent alternatives
while being applicable to CNNs too. Among our findings, we show that several
layers in CNNs retain photographically accurate information about the image,
with different degrees of geometric and photometric invariance.
This paper is about finding naturally looking images for the analysis of machine learning models in computer vision. There are 3 techniques:
* **inversion**: the aim is to reconstruct an image from its representation
* **activation maximization**: search for patterns that maximally stimulate a representation component (deep dream). This does NOT use an initial natural image.
* **caricaturization**: exaggerate the visual patterns that a representation detects in an image
The introduction is nice.
The paper comes with code: [robots.ox.ac.uk/~vgg/research/invrep](http://www.robots.ox.ac.uk/~vgg/research/invrep/index.html) ([GitHub: aravindhm/deep-goggle](https://github.com/aravindhm/deep-goggle))
* 2013, Zeiler & Fergus: [Visualizing and Understanding Convolutional Networks ](http://www.shortscience.org/paper?bibtexKey=journals/corr/ZeilerF13#martinthoma)