Saeed Izadi's profile - ShortScience.org

doi.ieeecomputersociety.org
sci-hub
scholar.google.com

Saliency Detection: A Boolean Map Approach
Zhang, Jianming and Sclaroff, Stan
International Conference on Computer Vision - 2013 via Local Bibsonomy
Keywords: dblp

[link] Summary by Saeed Izadi 7 years ago

Main Purpose:
* The main goal of the proposed method is to exploit a global perception
mechanism, known as figure-ground segregation and Boolean Map Theory of
visual attention to compute saliency map.

Drawbacks of previous works:
* Most of the previous works do not exploit the topological structures of an image to
saliency calculation. Thus this paper aims to exploit the topological structure of a
scene in saliency calculation.

Main Idea:
* Relying on Boolean Map Theory of visual attention, an observer’s momentary
conscious awareness of a scene can be represented by a Boolean map. Given an
input image, a set of Boolean maps can be generated by randomly thresholding its
feature channels, e.g. color channel. When the set of Boolean maps is generated,
some concepts from figure-ground segregation, introduced by Gestalt
psychological studies, can be utilized to create a saliency map. As Gestalt
psychological studies suggest, figures are more likely to be attended than to
background elements. But how the figures are characterized? There is some factors
that are likely to influence figure-ground segregation such as size, surroundedness,
symmetry and etc. The proposed method uses the surroundedness factor to form a
saliency map. This factor implies that figures tends to be surrounded, that is,
they’re likely to have a closed outer contour. Therefore, this factor can be
evaluated over Boolean maps and each region which has the closed outer contour,
not connected to the image borders, gets higher attention value. So each Boolean
map result in an attention map. Then, attentions maps are summed up lineally to
form a saliency map.
In brief, the contribution of the proposed method is twofold: 1) It uses Boolean
maps to characterizes an image. 2) It exploits the figure-ground segregation
concepts to form a saliency map.

Implementation Details:
* Given an input image, a set of Boolean maps are generated by thresholding the
color channels (Lab) with uniformly selected thresholds from 0 to 255. Then, each
Boolean map is evaluated by surroundedness. Those regions which are surrounded
(closed outer contour) get value 1 and others 0. These maps are called attention
maps. After some normalization steps, the attention maps (per Boolean map) are
summed up linearly to form a mean attention map. Again, some normalization and
post-processing steps are applied and the final saliency map is obtained.

* Conclusion
In this work, a novel Boolean Map based Saliency model is proposed to leverage
the surroundedness cue that helps in figure-ground segregation. The model
borrows the concept of Boolean map from the Boolean Map Theory of visual
attention and characterizes an image by a set of Boolean maps. This representation
leads to an efficient algorithm for saliency detection. BMS is the only model that
consistently achieves state-of-the-art performance on five benchmark eye tracking
datasets, and it is also shown to be useful in salient object detection.

arxiv.org
scholar.google.com

Effective deep learning training for single-image super-resolution in endomicroscopy exploiting video-registration-based reconstruction
Ravì, Daniele and Szczotka, Agnieszka Barbara and Shakir, Dzhoshkun Ismail and Pereira, Stephen P. and Vercauteren, Tom
arXiv e-Print archive - 2018 via Local Bibsonomy
Keywords: dblp

[link] Summary by Saeed Izadi 7 years ago

Main purpose:
* This work proposes a software-based resolution augmentation method which is more agile and simpler to implement than hardware engineering solutions.
* The paper examines three deep learning single image super resolution techniques on pCLE images 
* A video-registration based method is proposed to estimate ground truth HR pCLE images (this can be assumed as the main objective of the paper)

Highlights:
* The papers emphasise that this is the first work to address the image resolution problem in pCLE image acquisitions
* The paper introduces useful information on how pCLE devices work
* Strong related work 
* Clear story
* Comprehensive evaluation

Main Idea:
* Use video-registration based techniques to estimate the HR images (real ground truth HR image is not available)
* Simulate LR images from estimate HR images with help of Voronoi diagram and Delaunay-based linear interpolation.
* Train an Exemplar-based SR model (EBSR -- DL-based approach) to learn the mapping between simulated LR and estimate HR images. 

Methodology Details
* To estimate the HR images, a video-registration based mosaicking techniques (by the same authors in MIA 2006) is used which fuses a collection of input images by averaging the temporal information. 
* Since mosaicking generates  single large filed-of-view mosaic image from LR images, the mosaic-to-image diffeomorphic spatial transformation is used which results from the mosaicking process to propagate and crop the fused information from the mosaic back into each input LR image space. 
* At this point, the authors observe that the misalignment between input LR images (used in the video-registration based mosaicking technique) and estimate HR cause training problem for the EBSR model. So, they treat the HR images as realistic and chose to simulate LR images from them!!!!
* Simulated LR images by obtained using the Voronoi diagram (averaging the Voronoi cell on HR image) + additive noise on estimate HR images. 
* Finally, they build to experimental datasets 1) LR_org and HR and 2) LR_synth and HR and train three CNN SR models on these twor datasets. 
* They train FSRCNN, EDSR, SRGAN
* The networks are trained using L1+SSIM loss functions

Experiment Notes:
* SSIM and GCF are used to quantitatively assess the performance of the models.
* A composite score is also used to take SSIM and GCF into account jointly
* In the ideal case, when the models are trained and etsted on simulated LR and HR images, the quantitative results are convincing.  
* "From this experiment, it is possible to conclude that the proposed solution is capable of performing SR reconstruction when the models are trained on synthetic data with no domain gap at test time"
* When models are trained and tested on original LR and estimate HR images, the performance is not reasonable
* When the models are trained on simulated LR images and tested on original LR images, the results become better compared to the previous case, 
* For a solid conclusion, and MOS study was carried out. The models are trained on simulated LR images.

Saeed Izadi

sciscore: 3