[link]
**Introduction** Object segmentation methods are often produced an imprecise result as objects frequently not always agree with homogeneous regions. Thus this paper provides segmentation of images and videos into homogeneous region in color and texture feature cues called JSEG. Assumptions for the environments used are: * Image contains homogeneous color and texture regions * Color is quantized * There are distinct colors in neighboring regions **Related work** * Present work in image segmentation requires texture model parameter approximation that often needs the homogeneous region to produce good derivation. * There is an existing technique for segmentation using motion. However, this method is not dependable in noisy data, insufficiency in affine transformation for close-up motion, and errors in the presence of occlusion. **Approach** * Method consists of two stages that are color quantization and image segmentation spatially. * Colors quantized into several appointed classes to distinguish regions by weighting pixels individually using the Lloyd algorithm. * Result of quantized colors are assigned labels. These labels or class-map also define the composition of textures. https://i.imgur.com/2AlFD7Z.png * Class maps are labeled by three symbols that are: *, + and o. * Symbols indicate positions where line of segmentations need to be drawn, for example a class map with half of the left region that contains + symbol and the other half contains uniform distribution of \* and o can be segmented into two regions: one with + symbol and the other is a collection of * and o symbols. * Variance from the class map is computed and the value J is computed using the variance of both the same class and different class. * Value of J is small when image contains a uniform distribution of color classes and large otherwise. * The definition of J initiate an assumption of states of the class labels and specify information where line of segmentation could be drawn. * In the segmented region, the mean of J is calculated and the minimized value of J mean is a criterion to segment image given region numbers. * In a good segmentation, the value of J means is small as the number of colors that are uniformly distributed is small in the divided region. * Algorithm of spatial segmentation contains several stages: calculate J values in each region, growing regions by using seeds, and merging regions once scale has exceeded the threshold. Described as follows: https://i.imgur.com/uZzgrOO.png * Local J values applied as it has the property to indicate whether an area is in a region or near boundary of a region * Windows are used to detect region sizes. Large windows classify boundary of texture cues and small windows classify color or intensity edges. * Multiple sizes of windows are utilized with the circular shape of diameter 9 pixels for the smallest window. * To grow region using seeds, seeds are set first by finding mean and standard deviation of local J values, setting threshold by adding mean and standard deviation multiplied by preset values, and seed is fixed once it consists of an area larger than predetermined values for each window pixels. * Seeds is then grown by: removing empty classifications from fixed seeds before, averaging local J values in unsegmented region where if a region is near to only one seed, it is classified as the corresponding seed's region, calculating J values for smaller region, averaging more local J values in the respected remaining unsegmented region and growing region at the smallest scale. * Similarities in color built the merging of a region. As colors have quantized in histogram bins, distance is calculated between two histograms using Euclidean distance in CIE LUV color space. * To merge regions, distances are enlisted and pairs with small distance are joined. Next, a new feature vector is computed and process iterates for merging and generating new feature vectors until the maximum threshold is attained. * JSEG can be implemented in video data by using movements of objects as indirect constraints for tracking and segmentation. The assumption used for implementation is that videos have been parted into shots and shots are continuous scenes. * Video is decomposed in the spatiotemporal domain and grouped to be segmented for consecutive frames. * In this paper, 1000 frames are grouped and quantized for its color to generate class-maps * In frames that have color textures that are close to each other, they are counted as one object. * After seeds are fixed from the frames, tracking is done by assigning initial seed, overlapped seeds are considered to be one, iteratively checking overlapped seeds, and assign time duration for objects * To reduce the number of false merges, The value of J track is computed by calculating mean and standard deviance between two frames. When the region is static, the J value will be small and large otherwise * The running time for JSEG in video segmentation is equal to the application of image segmentation by grouped frames * Overall, parameters that need to be adjusted in using JSEG are color quantization threshold, number of image scales and object duration for video segmentation * In video segmentation, in average frames can be grouped as 10 to 15 frames **Paper contributions** * Paper provides a new method called JSEG to segment objects by spatial segmentation and color quantization in images and videos unsupervised. * Criteria to evaluate a good segmentation in an image is proposed. * Final segmentation is obtained by dividing region based on seed areas from J-image. |
[link]
**Introduction** * Salient region is an area where a striking combination of features in images is perceived at the first observation. * These features combined to make a region that has a significant distinction with other areas in the image. * This paper presents a map of saliency using linear combination of high-dimensional color representation space. **Related work** * Present work in saliency detection is divided into two groups which are taking into account low-level features, and statistical learning methods. Both approach has variety of results with no clear significance which performs better in saliency detection. * In the first group of taking into account of low-level features, there are approaches on saliency region detection based on color contrast, Markovian approach, and multi-scale saliency based on superpixels that have drawbacks on the pres.ence of high-contrast edges, unpreserved boundary, and segmentation parameters. * Using learning-based methods present are region detection saliency based on the regional descriptor, using graph-based, and sample patches. **Approach** * Paper provides a method of mapping low dimensional color of RGB, CIELab, HSV spaces into high dimensional color. * Method identify the presence of superpixel saliency features using the SLIC method with 500 number of superpixels. * Feature vector is defined by the location of superpixels and then combined with color which proceeds to color space representation computation. * Histogram features are later combined with 8 bins in each of the histogram and distance is computed using the Chi-squared method. * Global and local contrast are used using Euclidean distance with variance parameters of 0.25. * Histogram of gradients method with 31 dimensions is used to extract shape and texture features in the superpixels. As backgrounds tend to have more blur features in the pixel, separation of backgrounds is done using Singular Value Feature which algorithm is following the concept of Eigen images with weight acquired by using Singular Value Decomposition. * 75 dimension of feature vectors are then obtained for saliency detection. These feature maps are combined from all the superpixel operations mentioned. Features included are location features, color features, color histogram features, color contrast features, shape and texture features. * Regression algorithm is then applied to feature vectors. As large databases including 2000 images in a dataset are tested, the best present approach for the algorithm is a random forest. An unlimited number of nodes are applied with 200 trees. * To construct the transformation into high-dimensional color, a Trimap is built by dividing the starting saliency map to three different regions that are 2x2, 3x3, and 4x4. Then seven level of adaptive thresholding is applied to each subregion which then produces 21-level of the locally thresholded map. * Global threshold construct the trimap by the division of local levels, when the levels are more and equal to 18 levels, the map is defined by 1 and when the levels are less and equal than 6 levels, map is defined by 0. https://i.imgur.com/4eF1UZd.png * High dimensional color is used as it combines all the benefits of color representation properties. Nonlinear color representation in RGB and its gradients, CIELab color representation, and saturation and hue in HSV are combined. This produces 11 color channels. * Linear RGB values are not combined as it counteracts with YUV/YIQ color space. * Then gamma correction in the range of 0.5 to 2 with 0.5 intervals is applied, generating 44 high dimensional color vector. * Background that had been separated from the foreground in trimap is then examined to approximate color coefficients' linear combination using the minimum of least square problem of two matrices. The first matrix is a vector with binary value with 0 represents background and 1 represents foreground. The second matrix is color samples multiplied by a coefficient vector. * The map of the saliency region is then built by the summation of color samples applied to the coefficient vector estimation. The whole method is iterated three times to construct a stable saliency map. * The map is then refined by spatial information by adding more weights to pixels that contain foreground region. It is defined by exponential to the power of a parameter 0.5 applied to the minimum Euclidean distance for both foreground and background. It is concluded in the image below: https://i.imgur.com/k9heDiw.png * Algorithm is evaluated using three datasets that contain 5000 images in the MSRA dataset, 1000 images in the ECCSD dataset, and 643 multiple objects in the Interactive co-segmentation dataset. * Eight saliency region detection are compared using precision and recall measurements. The algorithms compared are LC, HC, LR, SF, HS, GMR, and DRFI. * F-measurement to evaluate performance is computed by application of precision-recall, and a quadratic parameter added by one that is divided by the summation of precision, recall, and the quadratic parameter. The quadratic parameter has a value of 0.3. * Algorithm performance placed at the second-best compared to all other methods. **Notes on The Paper** * Paper provides an algorithm that performs well for detecting saliency based on colors by generating features that are high dimensional from low-level features. * In high dimensional color space, salient regions are able to be separated from the background (Rahman, I et al. 2016). * If the algorithm is further developed using classifier, it is then able to integrate richer features as a higher dimension is present (Borji, A et al. 2017). |
[link]
**Introduction :** Corners, as feature cues in an image, is defined by two edge intersections. This definition has benefit in allowing precise location of the cue, although it is only valid when locality is maintained and the result is similar to the real corner location **Related work:** * Corner detector method present are SIFT global tracker that is using Difference of Gaussians and SURF that is using Haar wavelet to approximate Hessian determinat. These methods have drawback in high computation * FAST method is a corner detector that performs better than conventional corner detectors such as Harris corner detection. Drawbacks of this method is that it depends on the environment, therefore decision tree needs to be always constructed from scratch (ID3 greedy algorithm) **Approach:** * In detecting corners, discretized circle pixels are used to be compared with the center area. A circle with 3.4 diameter is used as test mask * Based on accelerated segment test, pixel is identified to be a corner when there are a number of pixel that has different value, either darker or lighter, than threshold of center pixel * The number of pixel that is used in this paper is the same as FAST-9 method, which nine size segment as it has the best performance * This number has the property to detect corners with some standard such that when different viewpoints are applied, it has the highest number of repeatability to detect corners correctly * FAST algorithm is building ternary tree which has three possible states that are darker, lighter, or similar, added by unknown state that leads to $4^N$ configuration space, while FAST-9 has dissimilarity in the circle's thickness which is increased to 3 pixels * Proposed corner detection describes the algorithm by testing one of the pixel with one question to pose. If a scenario of a pixel is given and question is evaluated, the next pixel and question in query is determined by the response * This algorithm expands the configuration space by adding not lighter and not darker which produces a binary tree representation in which evaluation can be done in each of the nodes, therefore the configuration space has the size of $6^N$ * Memory is accessed by three types of cost which are second pixel comparison, same row pixel test, and other pixel test * To make decision tree optimal, a method that has resemblance with backward induction is formulated. Configuration space is explored using Depth First Search algorithm where each leaf is described whether it has satisfy corner criteria by accelerated segment test * Cost of each node is calculated by summation of minimum cost in each pair of child that has the positive or negative value with probability of pixel nodes both parent and child * The algorithm calculates the probability of an image whether it has homogeneous and structured areas then proceed to make a decision tree according to it. The distribution of probability contains three probabilities that are mirror state probability and similar state probability https://i.imgur.com/lIHh7gL.png * Algorithm is improved to make it generic by jumping from one optimized tree to another based on the configuration of the respected leaf once it has termination condition based on the corner criteria * This switching method has no cost as it happens in a leaf. However, it affects the time by one test delay. Thus, AGAST can only perform worse than FAST once it needs to jump between either homogeneous or heterogeneous pixels consecutively * Corner detection is compared using three pixel mask sizes that are 4, 12 and 16. Comparison is done by applying Gaussian noise and blur to database of variety viewpoints of checkerboard database * As limited computation using conventional computers are present, four state configuration space is used using three different arc lengths that are 9, 10 and 11 * As the mask and arc length enlarged, the more features found. Small arc defines the location of the real corner * Large pattern leads to slower computation as it has more process to be done in order to detect the corner or evaluating features and it needs more advanced computing memory * Smaller pattern can also lead to elimination of feature detection as the features located near to each other, therefore in smaller feature, post processing technique is removed * When database is added by Gaussian blur and noise, the combination of pixel mask of 16 and arc length of 9 is more robust against the disturbance, therefore, repeatability is controlled by arc * Decision tree is also evaluated by computing the response time of corner detection. This is done by calculating the tests number that has possible pixel arrangement in a mask * Pixels arrangements that have close similarities are grouped. By observing the standard deviation, the group with large number of pixels that are alike has unbalanced decision tree. This happens as possible pixel arrangements are limited. Observed as follows: https://i.imgur.com/DLqCXhy.png * However, when one tree is used, adaptive tree has better performance than the conventional method * In comparison for trees that has different weight, the algorithm that jumps between trees or AGAST is optimized when the value of the weights are 0.1 and 1/3 * Performance of AGAST is also tested by comparison with FAST-9 where uniform probability distribution are used to make the trees * Both algorithm are tested on five scenes, which are laboratory, outdoor, indoor, aerial, and medical. The optimized tree can speed the corner detection up for 13$\%$ while AGAST can speed up in the range of 23$\%$ to over 30$\%$ **Paper contributions** * Paper provides an improved FAST corner detector that is able to dynamically adapt with its environment while also processing an image input * AGAST method is improving its predecessors method in saving time spent to process the image and also memory that is being used * It is also able to find more keypoints in corner detection |