[link]
## **Introduction** This paper presents a generalized technique of matching a deformable model to an image through energy minimization. An active contour model called snakes has been proposed which is a continuous and deformable spline always minimizing its energy under the influence of image forces as well as internal energy of the spline and external constraint forces. Because of its dynamic nature, snakes can be applied to image segmentation tasks (edges, lines, and subjective contours) as well as to motion tracking and stereo matching. ## **Method** A snake can be parametrically represented as $$ {v}(s) = (x(s), y(s)) $$ The energy functional of a snake consists of the internal and the external energy components. $$ E_{snake}(C) = \underbrace{E_{ext}(C)}_{\text{data term}} + \underbrace{E_{int}(C)}_{\text{regularization term}} $$ This can be decomposed into the following components: $$ E^{*}_{snake} = \int_{0}^{1} \Big[\: E_{int}({v}(s)) + E_{image}({v}(s)) + E_{constr}({v}(s)) \:\Big] \: ds $$ where the components represent the internal energy of the snake due to its shape and bending, the image forces, and the external constraint forces on the snake respectively. The internal energy of the snake can be represented as $$ E_{int} = \frac{1}{2} \{\alpha (s)\left | {v}_{s}(s) \right |^{2} \:+\: \beta (s)\left | {v}_{ss}(s) \right |^{2}\} $$ where the first order term and the second order term represent the elastic length and the stiffness of the snake respectively. The energy minimization is a $O(n)$ technique in time using sparse matrix methods (semi-implicit Euler method), which is is much faster than a $O(n^{2})$ fully explicit method. The energy due to the image forces is composed of three components: $$ E_{image} = w_{line}E_{line} + w_{edge}E_{edge} + w_{terminal}E_{terminal} $$ where $$E_{line} = -\big( G_{\sigma} * \nabla^{2}{I} \big)^{2},$$ $$ E_{edge} = -\left | \nabla {I}(x,y) \right |^{2},$$ $$E_{term} = \frac{\partial \theta}{\partial {n}_{\perp}} = \frac{C_{yy}C_{x}^{2} - 2C_{xy}C_{x}C_{y} + C_{xx}C_{x}^{2}}{(C_{y}^{2} + C_{x}^{2})^\frac{3}{2}}$$ * The minima of the $E_{line}$ energy functional define the edges of the image. Depending upon the sign of the $w_{line}$ term, the snake tries to align itself to the lightest or the darkest contour near it. Smoothening the image gradient with a Gaussian allows the snake to reach an equilibrium on a blurred energy functional, and when the blurring is slowly reduced, the snake stays in shape despite the small scale textural details because of its own smoothness constraint. * Because of the $E_{edge}$ term, the snake is attracted to the contours corresponding to large image gradients. * Consider $C(x,y)$ to be a slightly smoothened image obtained by applying a Gaussian filter to it. The $E_{terminal}$ represents the energy corresponding to the curvature of the level contours of $C(x,y)$, and can be calculated as the rate of change of the gradient angle $\theta$ along the direction of the outer unit normal to the curve. The combination of $E_{edge}$ and $E_{terminal}$ ensures that during the energy minimization, the snake is attracted to the edges and/or the terminations. Snakes can also be applied to the problem of stereo matching by adding an additional constraint to the energy functional. The constraint is that the disparities in the human visual system do not change too rapidly with space, meaning the following disparity has to be minimized, where the ${v}^{L}(s)$ and the ${v}^{R}(s)$ represent the left and the right snake contours respectively. $$ E_{stereo} = \big( {v}^{L}_{s}(s) - {v}^{L}_{s}(s) \big)^{2} $$ Similarly, snakes can also be applied to the problem of motion tracking since the snakes are able to "lock on" to salient visual features and can, therefore, track them accordingly across frames. ## **Discussion and Shortcomings** The snakes were the first variational approach to the image segmentation task, and therefore this was a seminal paper in image processing. By introducing the gradient (edge strength) as an energy term, they improved on the shortcomings of the prior published works. Because snakes are "active" and are therefore always minimizing their energy, they show hysteresis in response to moving input. The energy minimization for the snakes is achieved through gradient descent, and since the energy functional is not convex, it can get stuck on local minima. Therefore, a prerequisite for the snakes is that the curve initialization must be done sufficiently close to the desired solution. Moreover, because of numerical instabilities, as the control points move in time, the curve can have self-intersections which is undesired in image segmentation tasks.
Your comment:
|