Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps on ShortScience.org

arxiv.org
scholar.google.com

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
Simonyan, Karen and Vedaldi, Andrea and Zisserman, Andrew
arXiv e-Print archive - 2013 via Local Bibsonomy
Keywords: dblp

Summaries/Notes 3

[link] Summary by Shagun Sodhani 7 years ago

#### Introduction

* The paper presents gradient computation based techniques to visualise image classification models.
* [Link to the paper](https://arxiv.org/abs/1312.6034)

#### Experimental Setup

* Single deep convNet trained on ILSVRC-2013 dataset (1.2M training images and 1000 classes).
* Weight layer configuration is: conv64-conv256-conv256-conv256-conv256-full4096-full4096-full1000.

#### Class Model Visualisation

* Given a learnt ConvNet and a class (of interest), start with the zero image and perform optimisation by back propagating with respect to the input image (keeping the ConvNet weights constant).
* Add the mean image (for training set) to the resulting image.
* The paper used unnormalised class scores so that optimisation focuses on increasing the score of target class and not decreasing the score of other classes.

#### Image-Specific Class Saliency Visualisation

* Given an image, class of interest, and trained ConvNet, rank the pixels of the input image based on their influence on class scores.
* Derivative of the class score with respect to image gives an estimate of the importance of different pixels for the class.
* The magnitude of derivative also indicated how much each pixel needs to be changed to improve the class score.

##### Class Saliency Extraction

* Find the derivative of the class score with respect with respect to the input image.
* This would result in one single saliency map per colour channel.
* To obtain a single saliency map, take the maximum magnitude of derivative across all colour channels.

##### Weakly Supervised Object Localisation

* The saliency map for an image provides a rough encoding of the location of the object of the class of interest. 
* Given an image and its saliency map, an object segmentation map can be computed using GraphCut colour segmentation.
* Color continuity cues are needed as saliency maps might capture only the most dominant part of the object in the image.
* This weakly supervised approach achieves 46.4% top-5 error on the test set of ILSVRC-2013.

#### Relation to Deconvolutional Networks

* DeconvNet-based reconstruction of the $n^{th}$ layer input is similar to computing the gradient of the visualised neuron activity $f$ with respect to the input layer.
* One difference is in the way RELU neurons are treated: 
    * In DeconvNet, the sign indicator (for the derivative of RELU) is computed on output reconstruction while in this paper, the sign indicator is computed on the layer input.

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private