[link]
Main Purpose: * The main goal of the proposed method is to exploit a global perception mechanism, known as figure-ground segregation and Boolean Map Theory of visual attention to compute saliency map. Drawbacks of previous works: * Most of the previous works do not exploit the topological structures of an image to saliency calculation. Thus this paper aims to exploit the topological structure of a scene in saliency calculation. Main Idea: * Relying on Boolean Map Theory of visual attention, an observer’s momentary conscious awareness of a scene can be represented by a Boolean map. Given an input image, a set of Boolean maps can be generated by randomly thresholding its feature channels, e.g. color channel. When the set of Boolean maps is generated, some concepts from figure-ground segregation, introduced by Gestalt psychological studies, can be utilized to create a saliency map. As Gestalt psychological studies suggest, figures are more likely to be attended than to background elements. But how the figures are characterized? There is some factors that are likely to influence figure-ground segregation such as size, surroundedness, symmetry and etc. The proposed method uses the surroundedness factor to form a saliency map. This factor implies that figures tends to be surrounded, that is, they’re likely to have a closed outer contour. Therefore, this factor can be evaluated over Boolean maps and each region which has the closed outer contour, not connected to the image borders, gets higher attention value. So each Boolean map result in an attention map. Then, attentions maps are summed up lineally to form a saliency map. In brief, the contribution of the proposed method is twofold: 1) It uses Boolean maps to characterizes an image. 2) It exploits the figure-ground segregation concepts to form a saliency map. Implementation Details: * Given an input image, a set of Boolean maps are generated by thresholding the color channels (Lab) with uniformly selected thresholds from 0 to 255. Then, each Boolean map is evaluated by surroundedness. Those regions which are surrounded (closed outer contour) get value 1 and others 0. These maps are called attention maps. After some normalization steps, the attention maps (per Boolean map) are summed up linearly to form a mean attention map. Again, some normalization and post-processing steps are applied and the final saliency map is obtained. * Conclusion In this work, a novel Boolean Map based Saliency model is proposed to leverage the surroundedness cue that helps in figure-ground segregation. The model borrows the concept of Boolean map from the Boolean Map Theory of visual attention and characterizes an image by a set of Boolean maps. This representation leads to an efficient algorithm for saliency detection. BMS is the only model that consistently achieves state-of-the-art performance on five benchmark eye tracking datasets, and it is also shown to be useful in salient object detection. |
[link]
Main purpose: * This work proposes a software-based resolution augmentation method which is more agile and simpler to implement than hardware engineering solutions. * The paper examines three deep learning single image super resolution techniques on pCLE images * A video-registration based method is proposed to estimate ground truth HR pCLE images (this can be assumed as the main objective of the paper) Highlights: * The papers emphasise that this is the first work to address the image resolution problem in pCLE image acquisitions * The paper introduces useful information on how pCLE devices work * Strong related work * Clear story * Comprehensive evaluation Main Idea: * Use video-registration based techniques to estimate the HR images (real ground truth HR image is not available) * Simulate LR images from estimate HR images with help of Voronoi diagram and Delaunay-based linear interpolation. * Train an Exemplar-based SR model (EBSR -- DL-based approach) to learn the mapping between simulated LR and estimate HR images. Methodology Details * To estimate the HR images, a video-registration based mosaicking techniques (by the same authors in MIA 2006) is used which fuses a collection of input images by averaging the temporal information. * Since mosaicking generates single large filed-of-view mosaic image from LR images, the mosaic-to-image diffeomorphic spatial transformation is used which results from the mosaicking process to propagate and crop the fused information from the mosaic back into each input LR image space. * At this point, the authors observe that the misalignment between input LR images (used in the video-registration based mosaicking technique) and estimate HR cause training problem for the EBSR model. So, they treat the HR images as realistic and chose to simulate LR images from them!!!! * Simulated LR images by obtained using the Voronoi diagram (averaging the Voronoi cell on HR image) + additive noise on estimate HR images. * Finally, they build to experimental datasets 1) LR_org and HR and 2) LR_synth and HR and train three CNN SR models on these twor datasets. * They train FSRCNN, EDSR, SRGAN * The networks are trained using L1+SSIM loss functions Experiment Notes: * SSIM and GCF are used to quantitatively assess the performance of the models. * A composite score is also used to take SSIM and GCF into account jointly * In the ideal case, when the models are trained and etsted on simulated LR and HR images, the quantitative results are convincing. * "From this experiment, it is possible to conclude that the proposed solution is capable of performing SR reconstruction when the models are trained on synthetic data with no domain gap at test time" * When models are trained and tested on original LR and estimate HR images, the performance is not reasonable * When the models are trained on simulated LR images and tested on original LR images, the results become better compared to the previous case, * For a solid conclusion, and MOS study was carried out. The models are trained on simulated LR images. |