[link]
Summary by Anmol Sharma 5 years ago
Object detection in 2D scenes have mostly been performed using model-based approaches, which model the appearance of certain objects of interest. Although such approaches tend to work well in cluttered, noisy and occluded settings, the failure of such models to adapt to intra-object variability that is apparent in many domains like medical imaging, where the organ shapes tend to vary a lot, have lead to a need for a more robust approach. To this end, Cootes et al. propose a training based method which adapts and deforms well to per-object variations according to the training data, but still maintains rigidity across different objects.
The proposed method relies on a hand-labelled training set featuring a set of points called "landmark points" that describe certain specific positions of any object. For example, for a face the points may be "noise end, left eyebrow start, left eyebrow mid, left eyebrow end" and so on. Next, the landmark points across the whole training set are algined using affine transformations by minimizing a weighted-sum of squares difference (SSD) between corresponding landmark points amongst training examples. The optmization function (SSD) is weighted using the apparent variance of each landmark point. The higher the variance across training samples, the lower the weight. In order to ``summarize" the shape in the high-dimensional space of landmark point vectors, the proposed method uses Principal Component Analysis (PCA). PCA provides the eigenvectors which point to the direction of highest change in points in $2n$-dimensional space, while the corresponding eigenvalues provide the significane of each eigenvector. The best $t$ eigenvectors are chosen such that they describe a certain perctange of variance of the data. Once this is done, the model becomes capable of producing any shape by deriving from the mean shape of the object, using the equation:
$x = \bar{x} + Pb$
where $\bar{x}$ is the mean shape, $P$ = matrix of $t$ eigenvectors and $b$ = vector of free weights that can be tuned to generate new shapes. The values of $b$ are constrained to stay within boundaries determined using the training set, which essentially forms the basis of the argument that the model only deforms as per the training set.
The method was tested on a variety shapes, namely resistor models in electric circuits, heart model, worm model, and hand model. The models thus generated were robust and could successfully generate new examples by varying the values of $b$ on a straight line. However for worm-model, it was found that varying the values of $b$ only along a line may not be always suitable, especially in cases where the different dimensions of $b$ may have some existing non-linear relationship.
Once a shape model is generated, it is used to detect objects/shapes from new images. This is done by first initializing the model points on the image. The model points are then adjusted to the shape by using information from the image like edges. The adjustment is performed iteratively, by applying constraints on the calculated values of $dX$ and $dB$ so that they respect the training set. The iterations are performed until convergence of the model points to the actual shape of interest in the image.
One drawback of the proposed method is its high sensitivity to noise in training data annotations. Also, the relationship between various variables in $b$ is not entirely clear, and may negatively affect models when there exists a non-linear relationship. Also, the final convergence is somewhat dependent upon the initialization of the model points, and depend on local edge features for guidance, which may fail in some instances.
more
less