[link]
Summary by Desiana Nurchalifah 4 years ago
**Introduction**
Object segmentation methods are often produced an imprecise result as objects frequently not always agree with homogeneous regions. Thus this paper provides segmentation of images and videos into homogeneous region in color and texture feature cues called JSEG. Assumptions for the environments used are:
* Image contains homogeneous color and texture regions
* Color is quantized
* There are distinct colors in neighboring regions
**Related work**
* Present work in image segmentation requires texture model parameter approximation that often needs the homogeneous region to produce good derivation.
* There is an existing technique for segmentation using motion. However, this method is not dependable in noisy data, insufficiency in affine transformation for close-up motion, and errors in the presence of occlusion.
**Approach**
* Method consists of two stages that are color quantization and image segmentation spatially.
* Colors quantized into several appointed classes to distinguish regions by weighting pixels individually using the Lloyd algorithm.
* Result of quantized colors are assigned labels. These labels or class-map also define the composition of textures.
https://i.imgur.com/2AlFD7Z.png
* Class maps are labeled by three symbols that are: *, + and o.
* Symbols indicate positions where line of segmentations need to be drawn, for example a class map with half of the left region that contains + symbol and the other half contains uniform distribution of \* and o can be segmented into two regions: one with + symbol and the other is a collection of * and o symbols.
* Variance from the class map is computed and the value J is computed using the variance of both the same class and different class.
* Value of J is small when image contains a uniform distribution of color classes and large otherwise.
* The definition of J initiate an assumption of states of the class labels and specify information where line of segmentation could be drawn.
* In the segmented region, the mean of J is calculated and the minimized value of J mean is a criterion to segment image given region numbers.
* In a good segmentation, the value of J means is small as the number of colors that are uniformly distributed is small in the divided region.
* Algorithm of spatial segmentation contains several stages: calculate J values in each region, growing regions by using seeds, and merging regions once scale has exceeded the threshold. Described as follows:
https://i.imgur.com/uZzgrOO.png
* Local J values applied as it has the property to indicate whether an area is in a region or near boundary of a region
* Windows are used to detect region sizes. Large windows classify boundary of texture cues and small windows classify color or intensity edges.
* Multiple sizes of windows are utilized with the circular shape of diameter 9 pixels for the smallest window.
* To grow region using seeds, seeds are set first by finding mean and standard deviation of local J values, setting threshold by adding mean and standard deviation multiplied by preset values, and seed is fixed once it consists of an area larger than predetermined values for each window pixels.
* Seeds is then grown by: removing empty classifications from fixed seeds before, averaging local J values in unsegmented region where if a region is near to only one seed, it is classified as the corresponding seed's region, calculating J values for smaller region, averaging more local J values in the respected remaining unsegmented region and growing region at the smallest scale.
* Similarities in color built the merging of a region. As colors have quantized in histogram bins, distance is calculated between two histograms using Euclidean distance in CIE LUV color space.
* To merge regions, distances are enlisted and pairs with small distance are joined. Next, a new feature vector is computed and process iterates for merging and generating new feature vectors until the maximum threshold is attained.
* JSEG can be implemented in video data by using movements of objects as indirect constraints for tracking and segmentation. The assumption used for implementation is that videos have been parted into shots and shots are continuous scenes.
* Video is decomposed in the spatiotemporal domain and grouped to be segmented for consecutive frames.
* In this paper, 1000 frames are grouped and quantized for its color to generate class-maps
* In frames that have color textures that are close to each other, they are counted as one object.
* After seeds are fixed from the frames, tracking is done by assigning initial seed, overlapped seeds are considered to be one, iteratively checking overlapped seeds, and assign time duration for objects
* To reduce the number of false merges, The value of J track is computed by calculating mean and standard deviance between two frames. When the region is static, the J value will be small and large otherwise
* The running time for JSEG in video segmentation is equal to the application of image segmentation by grouped frames
* Overall, parameters that need to be adjusted in using JSEG are color quantization threshold, number of image scales and object duration for video segmentation
* In video segmentation, in average frames can be grouped as 10 to 15 frames
**Paper contributions**
* Paper provides a new method called JSEG to segment objects by spatial segmentation and color quantization in images and videos unsupervised.
* Criteria to evaluate a good segmentation in an image is proposed.
* Final segmentation is obtained by dividing region based on seed areas from J-image.
more
less