Statistical models for deformable templates in image and shape analysis (Modèles statistiques d’atlas déformables pour l’analyse d’images et de formes)

High dimensional data are more and more frequent in many application ﬁelds. It becomes particularly important to be able to extract meaningful features from these data sets. Deformable template model is a popular way to achieve this. This paper is a review on the statistical aspects of this model as well as its generalizations. We describe the diﬀerent mathematical frameworks to handle diﬀerent data types as well as the deformations. We recall the theoretical convergence properties of the estimators and the numerical algorithm to achieve them. We end with some published examples.


Introduction
Due to the increasing number of images available in many fields, image analysis has become an important challenge.This is particularly the case in the medical domain where images of different modalities are easier and easier to acquire.This amount of data paves the way to some population analysis, characterizing the population in terms of representative features, clustering the heterogeneous populations and of course classifying new observations.This review paper focuses on methods providing a statistical analysis of population in terms of both estimating a representative mean image and some characteristic geometrical variations within the observed population.
We are interested in the study of statistical deformable models whose ideas go back to D'Arcy Thompson in his book "On Growth and form" [49] for the study of animal and human anatomical shapes and which were generalized and formulated mathematically by Grenander in [31] for image analysis.The key idea is to assume that, for a given homogeneous population, each observation is a regular deformation of a mean representative image, called template, plus some additive noise.The deformation does not affect the template itself but moves the underlying space the image is painted on, carrying the color information from one location to another.The deformations are also characteristic of the population, making a difference among populations in terms of both the photometry (through the template) and the geometry (through the deformation distribution).They together will be called the atlas in the sequel.The main issue is to learn this atlas as the parameter of a statistical generative model when we observe a population of images.We will summarize here the different aspects of this model and the theoretical guaranties we have for this statistical estimation problem.
The general framework in this setting is the following: let H be a separable Hilbert space modeling a set of objects and G be a group acting on H, meaning that for any (h, φ) ∈ H × G the (left) action • of G onto H is such that φ • h ∈ H. Suppose that the data at hand Y 1 , . . ., Y n are i.i.d random variables in H satisfying the model: where T ∈ H is an unknown template, W i ∈ H is an additive noise (typically due to the measurement process) independent of the {φ i } 1≤i≤n , which are i.i.d random variables belonging to G modeling a geometric source of variability in the data.Given a template, the specification of a geometric deformation and an additive noise model, observations {Y n } n>0 can be simulated as independent realizations of the generative stochastic process (1.1).We present successively in this paper some classes of objects handled by this model and some typical deformation groups.These will specify the mathematical frameworks which have to be used to deal with the large variety of observations we encounter.Some extensions of model (1.1), which include time-dependency as well as mixture models, and enable population clustering, are also described to overcome some model limitations.In Section 3, we outline the theoretical consistency of the template estimator and full atlas estimation issues in some of the presented mathematical frameworks.Section 4 presents the most efficient and popular algorithms to numerically perform this estimation.The theoretical convergence of these algorithms are recalled.The paper ends with presenting some results, obtained by applying these methods to several types of data sets.

An overview of deformable template models
The deformable template model (1.1) applies to many different situations.In this section, we give an overview of the different aspects of the same model focusing on the data and deformations.

Types of data
The typical data considered by the deformable template model are signals, functions from Ω ⊂ R to R, or images of dimension d, functions from Ω ⊂ R d to R, where Ω is a compact subset and d = 2, 3 or 4, including time sequences.In this setting, the Hilbert space H is the space of squared integrable functions H = L 2 (Ω, dx) with respect to the Lebesgue measure.
To define the gray level of the template T on a given point x ∈ Ω, it is required to have more regularity on this template function.It is usually assumed that the template is at least piecewise continuous.
When we consider real data, we are usually provided with digital inputs which are discrete, giving gray level value on a regular grid of points.To adapt the deformable template model (1.1), we assume that T is defined on the whole subset Ω and belongs to H whereas each observation Y i is a discretization of the deformed template on a regular grid.This dimension reduction step is also shared by the noise structure W which therefore can be defined as a random vector of finite dimension.This discretization implies that the observations do not actually belong to the same Hilbert space as the template but for sake of simplicity we omit these two subsamplings for Y i and W i in the notation of Eq. (1.1).
Concerning the action of a deformation onto such a template, the idea is that the image (grey level map on Ω) is carried by the underlying space Ω of R d defined by its coordinate system.The deformation moves the points in the space which transports the grey level information.Therefore, the grey level at location x of the deformed template is the grey level of the template at location φ −1 (x) when φ is invertible.This writes : For the definition of this action, two conditions are required.First, we need to consider invertible φ's or to provide a meaningful approximation of φ −1 .Moreover, the deformation φ should be defined on Ω.
In many applications it is preferable to extract meaningful features from the image data prior to perform shape analysis.Considering the case of medical image data, such features will usually be derived from manual or automated segmentations of anatomical landmarks on the images.From a mathematical modeling viewpoint, such features take the form of sparse geometrical data lying in the ambient space: points or groups of points, curves or surfaces, needing specific models to be treated.More precisely, if we want to stick to our template/deformation model, we need to define specific Hilbert spaces for each kind of geometrical data in use.The first and easier type is the so-called landmarks case, where features form a sequence of n labeled points in space Ω.In this case we set H to be the Euclidean space (R d ) n , and the action of any deformation map φ ∈ G on x = (x 1 , . . ., x n ) ∈ H will simply write φ.x = (φ(x 1 ), . . ., φ(x n )).
When it comes to unlabeled groups of points, the previous model is not suitable.It has been proposed in [30] to model such features as sums of Dirac functionals embedded in the dual space of a functional Hilbert space.More precisely, an unlabeled set of points {x 1 , . . ., x n } ⊂ Ω is modeled as the measure µ = n i=1 a i δ x i , where non-equal weights a i may model differences in relevancy or confidence of locations between subsets of the set of points.Next, assume there exists a Hilbert space H of functions defined on Ω such that all evaluations functionals δ x : f → f (x), x ∈ Ω are continuous linear forms on H, or in other words Vect{δ x , x ∈ Ω} ⊂ H * .We set H = H * , the dual space of H, together with its dual Hilbert metric: for any µ ∈ H, µ H = Sup{µ(f ), f ∈ H, f H ≤ 1}.For the objects in use, sums of Dirac deltas, this dual metric takes an explicit form which gives to this model one of its main practical interest: if µ = n i=1 a i δ x i , then where k H denotes the reproducing kernel of space H. Indeed the assumption made on H exactly states that H is a Reproducing Kernel Hilbert Space (RKHS), and the space H is uniquely specified by its kernel k H .The action of a deformation map φ ∈ G on an element µ ∈ H is the usual push-forward of the measure µ, defined by duality as ∀f ∈ H, (φ.µ For sums of Dirac deltas, we get simply the explicit formula: . The previous measure framework can be used directly to treat the case of curves or surfaces embedded in Ω, because such objects can be modeled as measures.For example let C ⊂ Ω be a curve, parameterized by γ : [0, 1] → Ω.Then, one can associate to C a measure µ ∈ H via the rule However this does not provide a purely geometric model, since the action of deformation maps will in general modify the mass repartition on the image curve.In other words, the measure φ.µ will not be in general the uniform measure associated to the curve φ(C).A purely geometrical model is obtained when one replaces measures by currents, i.e. measures acting on spaces of differential forms.In this new setting, the space H is replaced by a Hilbert space W of m-differential forms, where m is the dimensionality of the data; and the current µ associated to an oriented m-submanifold S of Ω is defined via the rule µ(ω) = S ω.In the example of curve C, this writes ∀ω ∈ W, µ(ω) = 1 0 ω(γ(s))(γ (s))ds, and the dual norm in H = W * writes in terms of the kernel k W , which in its general form is a map from Ω 2 to the space of symmetric bilinear forms of R d : The push-forward of a deformation map φ ∈ G on an element µ ∈ H is defined via duality as ∀ω ∈ W, (φ.µ)(ω) = µ(φ.ω),where for all x ∈ Ω and α

. , dφ(x).α m ).
As opposed to the scalar case, this action is consistent with the action of φ on subsets: if µ is the current associated to an oriented m-submanifold S, then φ. µ will be the current associated to φ(S).
We will not detail any further this specific representation and refer to [53,29] for developments for the case of curves and surfaces.

Various deformation models
A simple setting to study the problem of estimating the template T in model (1.1) is to consider the case where G is a connected, semi-simple, and compact Lie group, and H = L 2 (G) the Hilbert space of complex valued, square integrable functions on the group G with respect to the Haar measure dφ.In [11] and [12], it has been proposed to consider the nonparametric estimation of a complex valued template T : G → C in the following deformable white noise model: 2) where the W i 's are independent copies of a standard Brownian sheet W on the topological space G with reference measure dφ, > 0 is the level of noise in the measurements, and the random variables φ i are supposed to be i.i.d. with a known density h ∈ L 2 (G).The white noise model (2.2) is a continuous model that is useful for studying the statistical estimation of T in the asymptotic setting n → +∞ as it avoids the problem of controlling the bias introduced by any discretization scheme.Some concrete examples of model (2.2) include the analysis of translated two-dimensional images, which corresponds to the case G = R 2 /Z 2 (the torus in dimension two), and which founds its applications in biomedical imaging or satellite remote sensing (see e.g.[26]).Other examples are rotation models for two-dimensional or three-dimensional images for which either G = SO (2) or G = SO(3) (the special orthogonal group in dimension 2 or 3) when the images at hand are observed through the action of random rotations (see e.g.[32,42,45]).Note that model (2.2) corresponds to a specific case where the group G and the domain Ω (on which the template T is defined) are the same.In practice the data are rather observed on a discrete grid of time points or pixels belonging to a convex set Ω ⊂ R d .Thus, an alternative model is 3) where T ∈ H = L 2 (Ω), the φ i 's are i.i.d.random variables in G, the W i, are i.i.d.standard Gaussian variables, σ > 0 is the level of additive noise, and the x 's are a set of p deterministic sampling points in Ω (e.g.x = /p in the setting Ω = [0, 1]).Model (2.3) assumes that G is a group acting on the domain Ω meaning φ • x ∈ Ω for any (φ, x) ∈ G × Ω.In model (2.3), the main goal is to construct consistent estimators of the template T and to recover the geometric variability of the data modeled by the φ i 's in the double asymptotic setting min(n, p) → +∞ (both the number of images and the number of sampling points tend to infinity).
The compact groups reduce to deformations which are highly constrained, typically rigid body deformations.Therefore, to include deformations with more degree of freedom, one can think of elastic maps.A first step in this generalization considers that a deformation is given by the displacement of each point in the definition domain Ω.This yields the existence of a velocity vector field, denoted v : Ω → Ω, such that for all x ∈ Ω, φ(x) = x + v(x).Note that there is no constraint on v which ensures the invertibility of φ.This requires to provide an adapted definition of the previously introduced actions when it involves the inverse of φ.The inverse of the mapping is approximated at first order ( v G small) by Id − v, where Id is the identity map on R d .
This framework has a big advantage as it provides an easy characterization of the deformation through v.However, this remains an infinite dimensional variable to consider and estimate.To reduce this problem to a finite dimensional one, one may assume that this vector has a finite decomposition on a dictionary basis.The velocity field is stated as follows.Given a dictionary of functions (b k ) 1≤k≤k G from Ω to Ω, for all φ ∈ G, there exists (2.4) The regularity of φ depends on the choice of the dictionary basis.This parametric model aims at constructing a consistent estimator of the full atlas as the number of observations tends to infinity.
As already pointed, the previous model of elastic, "small" deformations does not guarantee invertibility of the maps.A more rigorous approach takes the previous framework as a model for infinitesimal deformations only, building a group of diffeomorphisms by integrating such displacement fields.This construction has been introduced in several works [50,18] and has been intensely developed since then to derive efficient techniques for shape registration and analysis (see e.g.[36,44,9,27,6], among many others).The construction starts by specifying a Hilbert norm on the space V of displacement fields (see [50]).Then we consider time-dependent families of such vector-fields v ∈ L 2 ([0, 1], V ).The deformation map φ v associated to v is defined as the flow at time one of this family: (2.5) The induced group of deformation maps This exactly states that G is given the structure of an infinite-dimensional manifold, on which distances are computed as the length of minimal geodesic paths connecting two elements.In fact it is possible to derive the equations of geodesics, which can be seen as a special case of the Euler-Poincaré equations of motion, coined as EPDiff ( [35]).This allows to consider a new parameterization of deformation maps φ ∈ G via the initial velocity field v 0 .For simplicity, we only detail the finite-dimensional case below.
The large deformation representation boils down to a finite dimensional one when carried out in a discrete setting.Indeed, when the support of the data in use forms a finite set of points (as in the case of landmarks or point clouds), or can be approximated with a finite set of points (as e.g. for curves, surfaces or signals/images), then one only needs to consider initial velocity maps which are linear combinations of kernel functions centered at these points.More explicitly, if this set of discretization points are (x i ) 1≤i≤n , then the initial velocity map can be written as where the α i ∈ R d are the coefficients of the linear combination and are called momentum vectors.Equivalently it is possible to parameterize v 0 via the velocity vectors v 0 (x i ).The complete system of ODEs that express the deformation map φ parameterized by a finite set of initial momentum vectors α i is the following: The first two equations form the geodesic equations, while the third one is Eq.2.5 in this discrete setting.
Note that these geodesic equations in this finite dimensional case enable to parameterize the deformation by the initial momenta as well as the initial control point locations.

Mixture models
In the observation model (2.2), additive random fields W i are critical for the modeling of statistical dependencies between values of the deformed template φ i •T and observations Y i .For instance, when objects are images, they describe how intensities (gray-level values) of observations relates to the template ones.Assumptions made on the probability laws of these fields completely determine the nature of dependencies between observations and template.In particular, the basic assumption that fields W i are white noises implies that dependencies have the same statistical properties all over the space.
There are many situations where such an assumption is not valid (see for instance [34] and Section 5.2).For instance, a problem arises in contrastenhanced medical imaging when two tissues having different contrastagent absorption properties are observed with a same intensity range before contrast injection but two different ones after injection.In such a case, relationships between intensities of pre and post contrast images depend on tissue types, and are not spatially invariant.
To deal with such a situation, [34] proposed to define the probability distribution of W i conditionally to the class of the pixel.Let us assume that pixels of the domain Ω can be divided into K classes where templateobservation dependencies are different, and denote by L k (x) the probability for a pixel x to belong to the kth class.The probability distribution of In the defined model, the so-called class map L = {L(x ), x ∈ Ω, ∈ [ [1, p]]} is unknown and has to be estimated together with deformations.In [34], the estimation of deformations and class map is jointly performed by a MAP approach.It leads to a simultaneous classification of template pixels and registrations of the template and observations, which is called classifying registration.
Besides, some spatial constraints on pixel classes can be added to the model by specifying a probability distribution on the map L. For instance, in the case when K = 2 (two classes), we can set that we restrict the amount of pixels of the class 0, whereas we enforce the spatial homogeneity of the classes.
A mixture of the deformable template model (1.1) involving several templates can also be considered.It is particularly appealing when one wants to extract reference curves or patterns from data featuring high dispersion [3,41].In this setting, the data are described with a set of K templates {T 1 , . . ., T K } and each observation Y i is assigned to a (possibly unknown) class J i ∈ {1, . . ., K} such that given J i = j, (1.1) writes: ( In a parametric approach, each elementary component is characterized by its own parameter set θ j .In particular, each has its own template function T j , a prior deformation distribution p j and a prior weight w j , such that K j=1 w j = 1.As a result, the mixture of the deformable model can be fully parameterized by θ = (θ 1 , . . ., θ K ).The number of classes is fixed but priors on the parameters or other arguments enables to achieve an estimation of the optimal number for the analyzed population.
Retrieving the data diversity becomes possible by using a mixture model to estimate the templates {T 1 , . . ., T K }, which can then be used to perform unsupervised classification (see section 5).

Spatiotemporal models
In some cases, observations in model (2.2) are sequential, and can be naturally ordered by time.This is for instance the case in dynamic contrast enhanced imaging where a series of images of a same body part is produced to capture the dynamic of contrast agent absorption by tissues (see [33] and Section 5.3).In such a situation, time variations of observations may not be exclusively geometrical (i.e.caused by deformations).Variations may also appear as changes in observation values (photometric changes when observations are images).In dynamic contrast enhanced imaging, these variations are due to the contrast agent diffusion.This other source of variations can be included in the model by defining a time-dependent template.Letting t 1 < • • • < t m be acquisition times, the observation model is then of the form: where Y (i) , φ (i) , T (i) , W (i) are the observation, the deformation, the template and the noise at time t i , respectively.However, the model defined above cannot be used without setting some constraints on the temporal template.Indeed, defined with a large degree of freedom, the temporal template can not only account for photometric variations but also for geometrical variations.In other words, without limiting the degree of freedom, the two sources of observation variations are undistinguishable in the model.One way of fixing this issue consists in restricting the template definition to a class of parametric functions which is representative of the expected photometric variations.Such a solution was proposed in [33] in the context of dynamic contrast enhanced imaging.The template was defined using a class of functions which are known to be well-suited for fitting dynamic curves of contrast absorption.This class is characterized by just a few interpretable parameters.
In [33], the model is further constrained by partitionning pixels of Ω into classes of common temporal photometric variation: for all pixels x belonging to a same class k, , where f k is a parametric temporal curve for the kth class.Pixel classes and associated temporal curves are both estimated at the same time using an iterative algorithm, while the number of classes is set automatically during iterations.In this model, photometric variations are estimated globally on pixels of a same class rather than locally on each pixel.This reduces the influence of local geometrical variations on the estimation of photometric variations.
Another aspect of the dynamical model is to consider that the time dependent template is a smooth deformation of a baseline at a reference time point.This is particularly adapted to the evolution of shapes of brain structures such as hippocampus, caudate or basal ganglia along development or aging.Contrary to the spatiotemporal template previously described, it does not enable a change in contrast or equivalently topology of the shape.However, this framework enables to capture a subject growth model and also to produce a global template which evolves in time and space.This provides a mean growth scenario of given time evolving population.
Constructing a growth model of a single subject is similar to perform regression on a set of images.When a sequence of images is observed for a subject, one can estimate the evolution scenario from the first time point to the last observed one by deforming the initial shape so that it matches the intermediate observations.Several models have been proposed which differ from each other in the smoothness of the time dependency of the regression model ( [22,55,23]).In any case, the model writes : where (Y (i) ) 1≤i≤p is the longitudinal data, T is the baseline image (that could be fixed as the first acquired image or estimated in a joint optimization) and φ (t) is the continuous regression function.This scheme has been extended to estimate a spatiotemporal atlas from longitudinal data in which different subjects have been observed at several time points.The spatiotemporal atlas consists of a mean image, its evolution along time, together called the mean scenario of the population, as well as the correspondences that map this mean scenario to evolutions of n subjects.There are different ways to transport the mean scenario (time varying flow of deformation) from the template space to the space of the subject.In [46,40,39], the mapping is done by parallel transport.In [22], an adjoint transport has been proposed together with possible change in time dynamics.The observations are surfaces, the template is therefore a current and the deformation is assumed diffeomorphic through the LD-DMM framework introduced above.We refer to [22] for all the details of this atlas estimation.Note that this framework applies directly with any other type of data such as images.

Consistent estimation of a template
We summarize here the theoretical results of asymptotic convergence for the template estimation problem.Considering different situations, we describe the statistical consistency properties which can be achieved.

Nonparametric approach
A nonparametric approach is to consider that the template T belongs to an infinite dimensional space F ⊂ H such as a Sobolev ball, and that it cannot be parameterized by a finite number of parameters.In [11] and [12], it is shown that, in model (2.2), the density h of the random elements φ i ∈ G plays the role of the kernel a convolution operator that has to be inverted to construct an optimal estimator of T .Indeed, in (2.2), ET (φ −1 i φ) = G T ( φ−1 φ)h( φ) d φ is the convolution over the group G between the template T and the density h.Hence, it is possible to build an estimator of T using a regularized deconvolution method over Lie groups.This class of inverse problems is based on the use of harmonic analysis and Fourier analysis on compact Lie groups to transform convolution in a product of Fourier coefficients.In [11] and [12], it is shown that such estimators of T achieve an optimal rate of convergence over Sobolev balls in an asymptotic setting, where the number n of images tends to infinity.A more standard approach to estimate a template is a two step procedure which consists in first computing estimators φ1 , . . ., φn of the unobserved deformations φ 1 , . . ., φ n and then in averaging the data after an alignment step which yields to an estimator of the following form where Ti (x) denotes some estimator of the unknown i-th image T (φ −1 i x) in either model (2.2) or model (2.3).Such a procedure clearly depends on the quality of the estimation of the unobserved deformations.In the simple setting where Ω = G = S 1 [0, 1[ (the torus in dimension 1), the following lower bounds have been established in [10] and [12]: for any estimators ( φ1 , . . ., φn ) ∈ S n 1 and under mild assumptions on T and the density h of the random variables φ 1 , . . ., φ n , one has that in model (2.2) and in model (2.3) where 2 h(x)dx.Therefore, Eq. (3.2) shows that, in model (2.2) and in the asymptotic setting n → +∞, it is not possible to build consistent estimators of the deformations in the sense that lim inf n→+∞ E 1 n n i=1 ( φi − φ i ) 2 = 0. Thus, in such a setting, estimators T0 based on an alignment step (as defined in (3.1)) are not likely to be consistent estimators of the template T .To the contrary, in model (2.3) and in the double asymptotic setting min(n, p) → +∞, equation (3.3) suggests that one can compute consistent estimators of the deformations parameters.This implies that, in model (2.3), it is possible to construct estimators T0 via an alignment procedure such that lim For a detailed discussion, we refer to [10].
To conclude this section, we would like to mention that another point of view for statistical inference in either model (2.2) or (2.3), is to consider that the deformations φ i , i = 1, . . ., n are fixed (non-random) parameters belonging a finite dimensional Lie group.For various results in this semiparametric framework, we refer to [13,14,25,56].

A frequentist approach using large deformations
Now consider the following case.The data are 2D images observed on a grid of pixels x ∈ Ω, = 1, . . ., p, and the deformations in model (1.1) are large random diffeomorphisms φ a of Ω that are solutions of an O.D.E governed by vector fields v a : R 2 → R 2 parametrized by a set of coefficients a ∈ [−A, A] 2K for a given real A > 0, see [15] for further details.
To generate a large class of random diffeomorphisms, it is proposed in [15] to consider a random distribution on the vector a.Let us denote by P A a probability distribution on [−A, A] 2K .The data at hand are then supposed to satisfy the model where the a i 's are i.i.d.random variables sampled from the distribution P A , the W i, 's represent an additive noise independent from the vectors a i , and T : Ω → R is an unknown template.
In [15], it has been proposed to estimate T by a frequentist approach using M-estimation techniques (see e.g.[52] for an introduction).Denote by Z = {Z : Ω → R} a set of images uniformly bounded (e.g. by the maximum gray-level), and consider the generative model with a ∼ P A and W some additive noise.Let us define the function f as where Z is a given image in Z. Intuitively, f must be understood as the optimal cost to align the image Z onto the random image Y = (Y ) 1≤ ≤p using a deformation parametrized by some u ∈ [−A, A] 2K .At last, we define a contrast function F as where dP(a, W ) is the tensorial product measure on a.In practice, one can only compute the empirical contrast F n as where P n (a, W ) = 1 n n i=1 δ a i ,W i .Then, one can define a sequence of sets of estimators as Qn = arg min Z∈Z F n (Z), whose asymptotic behavior is compared with the deterministic one Q 0 = arg min Z∈Z F (Z).In [15], various conditions are discussed to ensure the consistency of the M-estimator Qn in the sense that any accumulation point of a sequence of images Ẑn ∈ Qn belongs to Q 0 almost surely.

A Bayesian approach using small deformations
We describe here the case where the template and the velocity vector fields of small deformations can be decomposed as a finite linear combination of some dictionary elements (see [1,5]) as in Eq. (2.4).We focus on fixed sub-spaces determined by two sets of landmark points (p k ) 1≤k≤k H covering a larger domain Ω p sup Ω and (g k ) 1≤k≤k G ∈ Ω.The template -resp.the deformation-is defined as a linear combination of a kernels K H -resp. K G -centered at the landmark points and are parameterized by the coefficients w ∈ R kp -resp.α ∈ R d×k G .We write The model assumes that the observations are drawn with respect to the deformable template model (1.1).The deformations are however not observed and are treated as hidden random variables.This implies to state a distribution on the deformation coefficients α ∈ R d×k G which are assume to follow a multivariate Gaussian distribution with zero mean and full covariance matrix Γ G .This has two advantages : first it introduces correlations between the movement of the control points.Moreover, it enables to catch the distribution of the observations as the marginal of the complete data (Y i , φ i ) 1≤i≤n .This prevents from trying to estimate the best deformations which match the template to each observations Y i which equals the mode of the posterior distributions of φ given Y i , T (which is not consistent; c.f. (3.2) and (3.3)).This uncertainty is taken into account when marginalizing with respect to the deformation.
The complete statistical model writes : for all 1 where W i are i.i.d standard normal random variables.Given this model, the parameters of interest, denoted by θ, are the template T , the covariance matrix of the deformation space Γ G and the noise variance σ 2 .The parameter estimates are obtained by maximizing the posterior likelihood on θ conditional on (Y i ) 1≤i≤n in a Bayesian framework.It has been proved in [1] that for any finite sample the maximum a posteriori exists and the MAP estimator is consistent.Without assuming that the observations are generated by the model described above but follow the distribution P we get : Let Θ * = { θ * ∈ Θ | E P (log q(y|θ * )) = sup θ∈Θ E P (log q(y|θ))}, where q is the observed likelihood.

Theorem 3.1 (Consistency on bounded prototypes). Under mild assumption on the model, we can prove that the restriction of Θ
* , is not empty and for any > 0 lim where δ is any metric compatible with the topology on Θ R .

Algorithms for atlas estimation
Now that we have the theoretical convergences ensured, the numerical schemes to compute these estimates are described.

Deterministic approach
The usual formulation of the template estimation is done by expressing the solution as the minimum of an energy.This energy is the sum of the contributions of the n observation registrations.Each of these contributions is a tradeoff between a fidelity to data term and a penalty on the deformation.This writes : where the tradeoff σ 2 has to be chosen by the user as a function of the details of the registration he/she expects.The minimization of this energy is performed by optimizing iteratively with respect to the deformations and the template while fixing the other.Starting for an initial template (usually the mean of the gray level images), the n matching problems can be done separately as minimizing the n registration energies relates to the n subjects.This is computed by gradient descent most of the time due to the complex non linear dependency of the energy with respect to the deformation parameters (e.g.velocity coefficients of the linear combination of basis elements, initial momenta, etc).Given these best matches ( φi ) 1≤i≤n , the template is computed as the minimum of the sum of the n energies taken at φ i = φi .Many applications of this methods can be found in the literature for all the deformations and types of data presented above and many others [15,48,21,37,54].
This optimization scheme can be interpreted in two ways.The first one is a geometric interpretation.Looking for a template which satisfies Eq. (4.1) is similar to find the mean image whose orbit under the action of the group of deformations is the closest to the observations.This approach is similar to find the Fréchet mean of the observations seen as random variables belonging to a nonlinear manifold.Indeed, the Fréchet mean [24] is an extension of the usual Euclidean mean to non-linear spaces endowed with non-Euclidean metrics.Minimizing the energy (4.1) can also be interpreted as finding a template which minimizes a dissimilarity measure that is a tradeoff between images registration and a penalty term to avoid too large deformations.This dissimilarity measure can be interpreted as a kind of "non-Euclidean metric" which makes the connection between minimizing (4.1) and computing an empirical Fréchet mean from the observations, see [28,10] for further details.
The second interpretation comes from the statistical modeling of the template estimation problem.The energy (4.1) can be seen as the log likelihood of the observations and the deformations in the model presented in Eq. (3.9) given the parameters θ.Minimizing this energy is equivalent to maximizing the likelihood of both the observations and the random deformations.
Note that this optimization algorithm tries to find the best deformations which fit the template to the observations.In the discrete setting introduced in Subsection 2.2, we have shown that the diffeomorphic deformation is parameterized by both the initial momenta and the initial locations of the control points.This allows for optimizing with respect to both of these parameters.This optimization is detailed in [20] where the number of control points is also optimized using an L 1 penalty on the momenta.

Stochastic approach
The deterministic approaches aims at estimating the template through the estimation of the deformations first.Although these technics produce really impressive results on different databases, the lower bound (3.3) suggest to try some other approaches when the noise level increases.The statistical model (3.9)where the deformations are unobserved seems to be an appropriate answer.Whereas the deterministic approach computes the complete likelihood (observations together with random deformations), one may rather focus on the observation likelihood only.This means to compute the maximum of the the marginal of the previous quantity with respect to the deformation coefficients.This falls naturally into the case for which the Expectation-Maximization (EM) algorithm was developed.
Let us briefly recall the EM steps.This algorithm is an iterative algorithm which aims at maximizing Starting from an initialization of the parameter sequence θ 0 , it creates a sequence of estimates (θ k ) k∈N .First, it computes the expectation (Estep) of the complete log likelihood with respect to the posterior distribution of the unobserved data using the current parameters : The second steps updates the parameter by maximizing (M-step) the function Q : A wide range of model falls into the so-called exponential family as the complete log-likelihood has the form log q(Y, φ; θ) = S(Y, φ), ψ(θ) + Φ(θ) where S is a vector value function called the sufficient statistics and ψ and Φ are two measurable functions.This particular forms enables to simplifies the computations since the E-step only requires to compute the expectation of the sufficient statistics.Although quite complex, the deformable template model (3.9) belongs to this family.In this setting, the maximization step is explicit as a function of the sufficient statistics : θ k+1 = θ(s k ) where s k is the conditional expectation of S at iteration k.
Unfortunately, the posterior distribution does not have a simple form so that the expectation cannot be calculated.Therefore, several methods have been proposed to reach an estimate of the parameters by approximating this expectation.One option is the deterministic algorithm presented above where the expectation is approximated by a single realization of the integrant taken at the mode of the posterior distribution.Another approach leads to the use of a stochastic version of the EM algorithm called Stochastic Approximation EM (SAEM) [19] coupled with Monte Carlo Markov Chain [5].The E-step is replaced by a simulation of the non-observed random variables, namely the deformations, from an ergodic Markov chain transition kernel.A stochastic approximation is then performed on the sufficient statistics using these samples.The M-step is unchanged.
The almost sure convergence of this algorithm towards the MAP estimator has been proved in different situations, first in [5] for the Bayesian Mixed effect model presented in Subsection 3.2.2 and in [3] for the mixture of this model detailed in Subsection 2.3.The requirements for convergence concern the Markov chain which has to be geometrically ergodic on any compact subset.Several choices may be done to satisfy this criterion [4,2].In [47], some particular representations of deformations and templates are introduced using a common finite element decomposition of the image domain.The resulting deformation and template fields have Markovian properties facilitating samplings and accelerating algorithm convergence.
When the data are observed sequentially, one can use the online EM [16].Using a sequential algorithm is useful for several reasons: storing the observations throughout the process (which might be resource intensive in case of high dimensional data) is no longer required in an online setting, because each observation is processed only once.Then, when the E-step conditional expectation in intractable, as it is the case in the deformable template model, its approximation is much lighter than in a batch algorithm.Finally, in cases where the observed data evolves during the acquisition time, the sequential learning provides an evolution of the trend of the parameters throughout the iterations.
Under some mild assumptions, verified by the deformable template model, it is shown that the sequence of estimates (θ k ) k∈N of θ converges with probability one to the set of stationary points of the Kullback-Leibler divergence between the marginal distribution of the observation and the model distribution.
Compared to the SAEM algorithm detailed earlier, the difference lies in that the stochastic approximation involves the conditional expectation of the sufficient statistics only given the last available observation.This online EM can be also coupled with MCMC methods when the conditional expectation E[S(Y, φ)|Y k+1 ; θk ] is intractable.
We can notice that at iteration k of the online EM, a single chain targeting the posterior distribution of the deformation q( • |Y k+1 ; θk ) is required, whereas n chains targeting the posterior distributions q( • |Y 1 ; θk ), . . ., q( • |Y n ; θk ) are necessary in the SAEM algorithm, N denoting the total number of observations.However, as already noticed in [3], the mixture model suffers from its sensitivity to the sampler which can lead to trapping states (states where the numerical value of the acceptation ration is too low so that the chain does not mix).In [17], it is suggested to slightly modify the original model (Section 2.3).The sampling scheme is adapted in the way that a Markov Chain targeting the joint posterior distribution of the class and the deformations is run on an extended state space with a Metropolis within Gibbs algorithm.In this way, the class sampling takes advantage of all the extended hidden data and will allow to jump from one model to another, provided that consistent deformations are proposed.With this sampling scheme, the posterior simulation of the class random variable is performed more efficiently as the chain visits more different models.

Applications
In this section, we present some of the results on template estimations from the deformable template model.

Estimated templates
The first application is based on the USPS database.Although very simple images, this example enables to see the challenges of this issue.In particular, it enables to clearly see some templates which look relevant, it also allows for reconstructing synthetic samples from the estimated model to evaluate the geometric variability which has been captures.In addition, since a test database is available, it can be used as a classifier where each new image is classified in the class with maximum likelihood.The images are 16 × 16 grey level images.Fig. 5.1 presents the noisy training sample used to estimate the ten templates on the right hand side.For more detailed results on this example, see [1,5,3].
Another 3D medical training set (left rows of Fig. 5.2) has been tested.These binary images represent some murine dendrite spline excrescences which were generated on genetically modified mice to mimic the Parkinson's disease.The template is presented in Fig. 5.3 and some synthetic examples drawn with respect to the deformable template model using the estimated parameters are shown in the two right rows of Fig. 5.2.For more details on this experiment, see [4,2].
The same databases where used to estimate multicomponent atlases when assuming that the population contained two classes.The results are presented on the right panel of Fig. 5.3 for the murine dendrite spine excrescences.These examples show the importance to be able to cluster the data in a given population.Indeed, the templates are very different from each other and therefore summarize much more precisely the population.

Classification via maximum a posteriori
Thanks to the generative model presented in Section 3.2.2,one can classify new data calculating its likelihood to belong to different classes.After estimating the parameters of a deformable template model for each class, classification should be performed by computing the maximum posterior on class given the image.This classification has been performed on the USPS (c.f.Tables 5.1).The classification results show the importance of having enough but not too many images per components and to estimate the complete atlas namely the template as well as the geometrical variability (through the covariance matrix of the deformation distribution).When allowing for at most 15 components per class (some may be empty) and training on the whole available training sample of USPS (7291 images), the classification performances reaches an error of 3.5%.The estimation has  been done with the template estimated by the deterministic algorithm (FAM-EM); however, the results are similar with the other estimation algorithm (MCMC-SAEM) since, in the noise free case, the estimated parameters are very close to each other.In the noisy case, however, the stochastic algorithm outperforms the other estimation process which can clearly be seen on the parameter estimates ( [5]).
A second test set is based on infrared images of combat aircrafts from the French Aerospace Lab ONERA.We refer to [38] for more details on the data and the classification performances.

Classifying registration
The mixture model presented in Section 2.3 was used for the registration of mammograms in the context of computer-aided detection of breast cancer [34].Usual automated detection techniques include comparisons of several mammograms either from left and right breasts of a same patient, from a same breast at different dates, or from a same breast before and after a contrast agent injection.In general, such techniques aim at localizing abnormal image differences which are signs of lesions.However, their application requires corrections of other differences due to normal factors.This is illustrated on images (a) to (c) of Figure 5.4 which show a pair of bilateral mammograms and their difference image.In image (c), observed differences result not only from the unilateral lesion (a circular opacity in the top left part of breast) but also from variations of breast size and positioning (particularly visible near the bottom left part of the breast boundary).In this context, image registration can be used as a mean to compensate for some normal differences.However, corrections made by registration should not concern abnormal differences.Indeed, these differences should be preserved for lesion detection to remain possible on registered images.But, usual registration techniques do not fulfill this specific requirement.Relying upon the assumption that intensity dependencies are spatially invariant, they do not distinguish between differences on normal and abnormal tissues, and correct both of them.
This shortcoming can be overcome using the mixture model in Section 2.3 with two classes (K = 2), one for normal tissues and another for lesions.As shown on images (e) and (f) of Figure 5.4, this model tends to enhance abnormal differences by reducing most exclusively the normal ones.Moreover, combining pixel classification and image registration, it also produces a lesion map (see image (d)) which can be directly used for detection.

Spatiotemporal templates
The spatiotemporal model (2.7) presented in Section 2.4 was successfully applied to dynamic contrast enhanced sequences of computed tomography images (DCE-CT sequences) [33]; some 2D-slices taken from a DCE-CT sequences are shown on the first row of Figure 5.5.This modality relies upon the injection of a contrast agent into the body.It is intended to observe during a short time period the diffusion of the agent through a body part, and to access to absorption properties of tissues.It is particularly used in cancerology for the diagnosis and the management of tumors, which are characterized by specific absorption properties.
The interpretation of DCE-CT signals is often ambiguous due to patient movements which produce geometrical variations interfering with contrast variations.Hence, so as to assist the radiologist in his analysis, it is particularly useful to develop computer tools which can separate both sources of variations, and compensate for the geometric ones.The spatiotemporal model (2.7) can be used to achieve this task.The temporal template that it enables to estimate is a version of the sequence which is attenuated in noise and compensated for movements.For illustration, an estimated temporal template is presented in the second row of the

Conclusion
This article summarizes the statistical aspects of the deformable template model in different mathematical frameworks.The weakness of this atlas estimation are of two types.First, the model itself can be optimized in particular concerning the deformation parameter distribution.The Gaussian prior, however very convenient, is not always realistic.Other distributions should be investigated.The other weakness concerns the numerical aspects.Indeed, due to the high dimension of the data and their increasing number, the computational power of the algorithms has to be optimized particularly concerning the covariance matrix estimation.The estimation of this high dimensional full matrix has to be carefully considered in order to avoid memory issues.However, this model features also major advantages.First, it enables to describe a population in terms of statistical descriptors such as mean and covariance.It also approximates the generation of these images or shapes which allows to better understand the different aspects -geometric and photometric-of the training set.Moreover, the generative form of the model enables to re-sample new elements, providing an augmented basis where subtle features may appear clearer.A second advantage of this model and its estimation algorithms is the theoretical properties it achieves.The consistence of the MAP estimators as well as the convergence of the estimation algorithms are crucial as they ensure the relevance of the results.Moreover, the theoretical bounds on deterministic procedures also provides information about the expected accuracy of estimates.The last major advantage of this model is that it can be generalized for many different problems.For example, including the segmentation issue in a probabilistic atlas estimation, dealing with multimodal databases or forcing the deformation to be diffeomorphic are directions of current researches which are all based on the described model here.
Other challenges remain open.Dealing with in depth the convergence issue that the deterministic algorithm has to face would enable to propose alternative options to the user depending on the data set he / she is considering.This aspect is of great importance as it would also make the link with the usual variational approaches and its validation.The sensitivity of outliers is also important and particularly in the online estimation process.These data should be detected along with the estimation and split up from the rest.This is partially done by the multi-templates model although selecting the number of components in advance remains difficult.The very high dimension of the variables is also a challenge.Reducing their dimension and more particularly adapting the parametric models should be part of the estimation.Finding variables of lower dimension may enable an easier interpretation of the estimates and also increase the classification power.
Many other works are related to the issue of template estimation using different models (e.g.[51,43,8,54]) or algorithms ( [7]).The results recalled here may be used to achieve theoretical properties of these approaches which, to the best of our knowledge, do not exists yet.

Figure 5 . 1 .
Figure 5.1.Top: exemplars of the USPS training database.Bottom: the ten templates estimated on this USPS base.

Figure 5 . 2 .
Figure 5.2.Top: Exemplars of the binary images of the segmented murine dendrite spine excrescences.Bottom: Synthetic samples generated from the deformable template model with the estimated parameters.

Figure 5 . 3 .
Figure 5.3.Left: Grey level template estimated from the 3D murine dendrite spine excrescences.Binary volume (second image) is created by thresholding the previous continuous image.(Third and forth images) Two spine excrescences templates estimated from the mixture deformable template model.

Figure 5 . 4 .
Figure 5.4.Images (a) and (b) form a pair of bilateral mammograms of a same woman.Image (c) is the difference between images (a) and (b) before registration.The image (e) is the deformation of image (b) obtained by registration of image (a) and (b) with a classifying registration technique.Image (f) is the difference between registered images.Image (d) is the class map estimated by a classifying registration technique.