References of "Haesbroeck, Gentiane"
     in
Bookmark and Share    
Full Text
See detailComparison of robust detection techniques for local outliers in multivariate spatial data
Ernst, Marie ULg; Haesbroeck, Gentiane ULg

Conference (2014, August 22)

Spatial data are characterized by statistical units, with known geographical positions, on which non spatial attributes are measured. Spatial data may contain two types of atypical observations: global ... [more ▼]

Spatial data are characterized by statistical units, with known geographical positions, on which non spatial attributes are measured. Spatial data may contain two types of atypical observations: global and/or local outliers. The attribute values of a global outlier are outlying with respect to the values taken by the majority of the data points while the attribute values of a local outlier are extreme when compared to those of its neighbors. Usual outlier detection techniques may be used to find global outliers as the geographical positions of the data is not taken into account in this specific search. The detection of local outliers is more complex, especially when there are more than one non spatial attributes. This talk focuses on local detection with two main objectives. First, we will shortly review some of the local detection techniques that seem to perform well in practice. Among these, one can find robust ``Mahalanobis-type'' detection techniques and a wheighted PCA approach. We suggest an adaptation to one of these to further develop its local characteristic. Then, examples and simulations, based on linear model of co-regionalisation with Matern models, are reported and discussed in order to compare in an objective way the different detection techniques. [less ▲]

Detailed reference viewed: 31 (4 ULg)
Full Text
See detailRobustness and efficiency of multivariate coefficients of variation
Aerts, Stéphanie ULg; Haesbroeck, Gentiane ULg; Ruwet, Christel ULg

Conference (2014, August 12)

The coefficient of variation is a well-known measure used in many fields to compare the variability of a variable in several populations. However, when the dimension is greater than one, comparing the ... [more ▼]

The coefficient of variation is a well-known measure used in many fields to compare the variability of a variable in several populations. However, when the dimension is greater than one, comparing the variability only marginally may lead to controversial results. Several multivariate extensions of the univariate coefficient of variation have been introduced in the literature. In practice, these coefficients can be estimated by using any pair of location and covariance estimators. However, as soon as the classical mean and covariance matrix are under consideration, the influence functions are unbounded, while the use of any robust estimators yields bounded influence functions. While useful in their own right, the influence functions of the multivariate coefficients of variation are further exploited in this talk to derive a general expression for the corresponding asymptotic variances under elliptical symmetry. Then, focusing on two of the considered multivariate coefficients, a diagnostic tool based on their influence functions is derived and compared, on a real-life dataset, with the usual distance-plot. [less ▲]

Detailed reference viewed: 22 (6 ULg)
Full Text
See detailEugène Catalan
Bair, Jacques ULg; Haesbroeck, Gentiane ULg

in Tangente (2014), 158

Eugène Catalan est né il y a exactement 200 ans ! Ses travaux mathématiques sont remarquables, tant par leur quantité (on dénombre environ 380 articles et 70 livres ou mémoires) que du point de vue de ... [more ▼]

Eugène Catalan est né il y a exactement 200 ans ! Ses travaux mathématiques sont remarquables, tant par leur quantité (on dénombre environ 380 articles et 70 livres ou mémoires) que du point de vue de leur qualité et de leur variété. En effet, son nom est encore aujourd'hui associé à de nombreux domaines des mathématiques. Citons notamment diverses conjectures fameuses sur les nombres ou équations, l'étude de nombres éponymes en combinatoire, l'introduction des polyèdres semi-réguliers ou des surfaces minimales en géométrie, le calcul d'intégrales multiples ou encore de séries en analyse. A l'occasion de cet anniversaire, nous mettons en évidence quelques éléments de sa biographie. [less ▲]

Detailed reference viewed: 20 (4 ULg)
Full Text
See detailMultivariate coefficients of variation: comparison and influence functions
Aerts, Stéphanie ULg; Haesbroeck, Gentiane ULg; Ruwet, Christel ULg

E-print/Working paper (2014)

In the univariate setting, coefficients of variation are well known and used to compare the variability of populations characterized by variables expressed in different units or having really different ... [more ▼]

In the univariate setting, coefficients of variation are well known and used to compare the variability of populations characterized by variables expressed in different units or having really different means. When dealing with more than one variable, the use of such a relative dispersion measure is much less common even though several generalizations of the coefficient of variation to the multivariate setting have been introduced in the literature. In this paper, the lack of robustness of the sample versions of the multivariate CV's is illustrated by means of influence functions and a robust counterpart based on the Minimum Covariance Determinant (MCD) estimator is advocated. Then, focusing on two of the considered multivariate CV's, a diagnostic tool based on their influence functions is derived and its efficiency in detecting observations having an unduly large effect on variability is illustrated on a real-life data set. The influence functions are also used to compute asymptotic variances under elliptical distributions, yielding approximate confidence intervals. Finally, simulations are conducted in order to compare the performance of the classical and robust multivariate CV's in a finite sample setting. As expected, when the data are normally distributed, the classical estimator performs better than the robust counterpart based on the MCD estimator, while the reverse is true when the data are contaminated. [less ▲]

Detailed reference viewed: 30 (10 ULg)
Full Text
See detailRobust detection techniques for multivariate spatial data
Ernst, Marie ULg; Haesbroeck, Gentiane ULg

Poster (2013, November 26)

Spatial data are characterized by statistical units, with known geographical positions, on which non spatial attributes are measured. Two types of atypical observations can be defined: global and/or local ... [more ▼]

Spatial data are characterized by statistical units, with known geographical positions, on which non spatial attributes are measured. Two types of atypical observations can be defined: global and/or local outliers. The attribute values of a global outlier are outlying with respect to the values taken by the majority of the data points while the attribute values of a local outlier are extreme when compared to those of its neighbors. Classical outlier detection techniques may be used to find global outliers as the geographical positions of the data is not taken into account in this search. The detection of local outliers is more complex especially when there are more than one non spatial attribute. In this poster, two new procedures for local outliers detection are defined. The first approach is to adapt an existing technique using in particular a regularized estimator of the covariance matrix. The second technique measures outlyingness using depth function. [less ▲]

Detailed reference viewed: 11 (5 ULg)
Full Text
See detailDetection of Local and Global Outliers in Spatial Data
Ernst, Marie ULg; Haesbroeck, Gentiane ULg

Conference (2013, July 11)

Spatial data are characterized by statistical units, with known geographical positions, on which non spatial attributes are measured. Two types of atypical observations can be defined: global and/or local ... [more ▼]

Spatial data are characterized by statistical units, with known geographical positions, on which non spatial attributes are measured. Two types of atypical observations can be defined: global and/or local outliers. The attribute values of a global outlier are outlying with respect to the values taken by the majority of the data points while the attribute values of a local outlier are extreme when compared to those of its neighbors. Classical outlier detection techniques may be used to find global outliers as the geographical positions of the data is not taken into account in this search. The detection of local outliers is more complex especially when there are more than one non spatial attribute. In this talk, existing techniques were outlined and two new procedures were defined. The first approach is to adapt an existing technique using in particular a regularized estimator of the covariance matrix. The second technique measures outlyingness using depth function. [less ▲]

Detailed reference viewed: 55 (25 ULg)
Full Text
See detailPrix Nobel d'Economie et mathématiques
Bair, Jacques ULg; Haesbroeck, Gentiane ULg

in Losanges (2013), 20

Le Prix Nobel d'Economie 2012 a été décerné aux deux mathématiciens américains Lloyd Shapley et Alvin Roth. Cet événement nous a donné l'occasion de nous pencher quelque peu sur des prix internationaux ... [more ▼]

Le Prix Nobel d'Economie 2012 a été décerné aux deux mathématiciens américains Lloyd Shapley et Alvin Roth. Cet événement nous a donné l'occasion de nous pencher quelque peu sur des prix internationaux pouvant être attribués à des mathématiciens. [less ▲]

Detailed reference viewed: 35 (6 ULg)
Full Text
Peer Reviewed
See detailClassification performance resulting from of 2-means
Ruwet, Christel ULg; Haesbroeck, Gentiane ULg

in Journal of Statistical Planning & Inference (2013), 143(2), 408-418

The k-means procedure is probably one of the most common nonhierachical clustering techniques. From a theoretical point of view, it is related to the search for the k principal points of the underlying ... [more ▼]

The k-means procedure is probably one of the most common nonhierachical clustering techniques. From a theoretical point of view, it is related to the search for the k principal points of the underlying distribution. In this paper, the classification resulting from that procedure for k=2 is shown to be optimal under a balanced mixture of two spherically symmetric and homoscedastic distributions. Then, the classification efficiency of the 2-means rule is assessed using the second order influence function and compared to the classification efficiencies of the Fisher and logistic discriminations. Influence functions are also considered here to compare the robustness to infinitesimal contamination of the 2-means method w.r.t. the generalized 2-means technique. [less ▲]

Detailed reference viewed: 72 (14 ULg)
Full Text
Peer Reviewed
See detailRobust estimation for ordinal regression
Croux, Christophe ULg; Haesbroeck, Gentiane ULg; Ruwet, Christel ULg

in Journal of Statistical Planning & Inference (2013), 143(9), 14861499

Ordinal regression is used for modelling an ordinal response variable as a function of some explanatory variables. The classical technique for estimating the unknown parameters of this model is Maximum ... [more ▼]

Ordinal regression is used for modelling an ordinal response variable as a function of some explanatory variables. The classical technique for estimating the unknown parameters of this model is Maximum Likelihood (ML). The lack of robustness of this estimator is formally shown by deriving its breakdown point and its influence function. To robustify the procedure, a weighting step is added to the Maximum Likelihood estimator, yielding an estimator with bounded influence function. We also show that the loss in efficiency due to the weighting step remains limited. A diagnostic plot based on the Weighted Maximum Likelihood estimator allows to detect outliers of different types in a single plot. [less ▲]

Detailed reference viewed: 27 (7 ULg)
Full Text
See detailModèles chaotiques en économie
Bair, Jacques ULg; Haesbroeck, Gentiane ULg

in Tangente Sup (2012), 63 - 64

Dans cet ouvrage consacré au thème des prévisions, nous montrons que certains modèles mathématiques ,reposant sur des équations récurrentes non linéaires, permettent de décrire des phénomènes qui ... [more ▼]

Dans cet ouvrage consacré au thème des prévisions, nous montrons que certains modèles mathématiques ,reposant sur des équations récurrentes non linéaires, permettent de décrire des phénomènes qui paraissent aléatoires, alors qu'ils sont purement déterministes. Une application à l'économie illustre les propos. [less ▲]

Detailed reference viewed: 60 (10 ULg)
Full Text
Peer Reviewed
See detailImpact of contamination on training and test error rates in statistical clustering
Ruwet, Christel ULg; Haesbroeck, Gentiane ULg

in Communications in Statistics : Simulation & Computation (2011), 40(3), 394-411

The k-means algorithm is one of the most common nonhierarchical methods of clustering. It aims to construct clusters in order to minimize the within cluster sum of squared distances. However, as most ... [more ▼]

The k-means algorithm is one of the most common nonhierarchical methods of clustering. It aims to construct clusters in order to minimize the within cluster sum of squared distances. However, as most estimators defined in terms of objective functions depending on global sums of squares, the k-means procedure is not robust with respect to atypical observations in the data. Alternative techniques have thus been introduced in the literature, e.g. the k-medoids method. The k-means and k-medoids methodologies are particular cases of the generalized k-means procedure. In this paper, focus is on the error rate these clustering procedures achieve when one expects the data to be distributed according to a mixture distribution. Two different definitions of the error rate are under consideration, depending on the data at hand. It is shown that contamination may make one of these two error rates decrease even under optimal models. The consequence of this will be emphasized with the comparison of influence functions and breakdown points of these error rates. [less ▲]

Detailed reference viewed: 102 (43 ULg)
Full Text
See detailRobustness in ordinal regression
Ruwet, Christel ULg; Haesbroeck, Gentiane ULg; Croux, Christophe

Conference (2010, October 14)

Logistic regression is a widely used tool designed to model the success probability of a Bernoulli random variable depending on some explanatory variables. A generalization of this bimodal model is the ... [more ▼]

Logistic regression is a widely used tool designed to model the success probability of a Bernoulli random variable depending on some explanatory variables. A generalization of this bimodal model is the multinomial case where the dependent variable has more than two categories. When these categories are naturally ordered (e.g. in questionnaires where individuals are asked whether they strongly disagree, disagree, are indifferent, agree or strongly agree with a given statement), one speaks about ordered or ordinal regression. The classical technique for estimating the unknown parameters is based on Maximum Likelihood estimation (e.g. Powers and Xie, 2008 or Agresti, 2002). However, as Albert and Anderson (1984) showed in the binary context, Maximum Likelihood estimates sometimes do not exist. Existence conditions in the ordinal setting, derived by Haberman in a discussion of McCullagh’s paper (1980), as well as a procedure to verify that they are fulfilled on a particular dataset will be presented. On the other hand, Maximum Likelihood procedures are known to be vulnerable to contamination in the data. The lack of robustness of this technique in the simple logistic regression setting has already been investigated in the literature (e.g. Croux et al., 2002 or Croux et al., 2008). The breakdown behaviour of the ML-estimation procedure will be considered in the context of ordinal logistic regression. A robust alternative based on a weighting idea will then be suggested and compared to the classical one by means of their influence functions. Influence functions can be used to construct a diagnostic plot allowing to detect influential observation for the classical ML procedure (Pison and Van Aelst, 2004). [less ▲]

Detailed reference viewed: 34 (7 ULg)
Full Text
See detailRobust ordinal logistic regression
Ruwet, Christel ULg; Haesbroeck, Gentiane ULg; Croux, Christophe ULg

Conference (2010, June 28)

Logistic regression is a widely used tool designed to model the success probability of a Bernoulli random variable depending on some explanatory variables. A generalization of this bimodal model is the ... [more ▼]

Logistic regression is a widely used tool designed to model the success probability of a Bernoulli random variable depending on some explanatory variables. A generalization of this bimodal model is the multinomial case where the dependent variable has more than two categories. When these categories are naturally ordered (e.g. in questionnaires where individuals are asked whether they strongly disagree, disagree, are indifferent, agree or strongly agree with a given statement), one speaks about ordered or ordinal logistic regression. The classical technique for estimating the unknown parameters is based on Maximum Likelihood estimation. Maximum Likelihood procedures are however known to be vulnerable to contamination in the data. The lack of robustness of this technique in the simple logistic regression setting has already been investigated in the literature, either by computing breakdown points or influence functions. Robust alternatives have also been constructed for that model. In this talk, the breakdown behaviour of the ML-estimation procedure will be considered in the context of ordinal logistic regression. Influence functions will be computed and shown to be unbounded. A robust alternative based on a weighting idea will then be suggested and illustrated on some examples. The influence functions of the ordinal logistic regression estimators may be used to compute classification efficiencies or to derive diagnostic measures, as will be illustrated on some examples. [less ▲]

Detailed reference viewed: 91 (12 ULg)
Full Text
See detailRobustness properties of the ordered logistic discrimination
Ruwet, Christel ULg; Haesbroeck, Gentiane ULg; Croux, Christophe

Scientific conference (2010, May 20)

Logistic regression is a widely used tool designed to model the success probability of a Bernoulli random variable depending on some explanatory variables. A generalization of this bimodal model is the ... [more ▼]

Logistic regression is a widely used tool designed to model the success probability of a Bernoulli random variable depending on some explanatory variables. A generalization of this bimodal model is the multinomial case where the dependent variable has more than two categories. When these categories are naturally ordered (e.g. in questionnaires where individuals are asked whether they strongly disagree, disagree, are indifferent, agree or strongly agree with a given statement), one speaks about ordered or ordinal logistic regression. The classical technique for estimating the unknown parameters is based on Maximum Likelihood estimation. Maximum Likelihood procedures are however known to be vulnerable to contamination in the data. The lack of robustness of this technique in the simple logistic regression setting has already been investigated in the literature, either by computing breakdown points or influence functions. Robust alternatives have also been constructed for that model. In this talk, the breakdown behavior of the ML-estimation procedure will be considered in the context of ordinal logistic regression. Influence functions will be computed and shown to be unbounded. A robust alternative based on a weighting idea will then be suggested and illustrated on some examples. These influence functions may be used to derive diagnostic measures, as will be illustrated on some examples. Furthermore, breakdown points will also be computed. [less ▲]

Detailed reference viewed: 23 (5 ULg)
Full Text
Peer Reviewed
See detailRelaxMCD: smooth optimisation for the Minimum Covariance Determinant estimator
Schyns, Michael ULg; Haesbroeck, Gentiane ULg; Critchley, Frank

in Computational Statistics & Data Analysis (2010), 54(4), 843-857

The Minimum Covariance Determinant (MCD) estimator is a highly robust procedure for estimating the center and shape of a high dimensional data set. It consists of determining a subsample of h points out ... [more ▼]

The Minimum Covariance Determinant (MCD) estimator is a highly robust procedure for estimating the center and shape of a high dimensional data set. It consists of determining a subsample of h points out of n which minimizes the generalized variance. By definition, the computation of this estimator gives rise to a combinatorial optimization problem, for which several approximative algorithms have been developed. Some of these approximations are quite powerful, but they do not take advantage of any smoothness in the objective function. In this paper, focus is on the approach outlined in a general framework in Critchley et al. (2009) and which transforms any discrete and high dimensional combinatorial problem of this type into a continuous and low-dimensional one. The idea is to build on the general algorithm proposed by Critchley et al. (2009) in order to take into account the particular features of the MCD methodology. More specifically, both the adaptation of the algorithm to the specific MCD target function as well as the comparison of this “specialized” algorithm with the usual competitors for computing MCD are the main goals of this paper. The adaptation focuses on the design of “clever” starting points in order to systematically investigate the search domain. Accordingly, a new and surprisingly efficient procedure based on the well known k-means algorithm is constructed. The adapted algorithm, called RelaxMCD, is then compared by means of simulations and examples with FASTMCD and the Feasible Subset Algorithm, both benchmark algorithms for computing MCD. As a by-product, it is shown that RelaxMCD is a general technique encompassing the two others, yielding insight about their overall good performance. [less ▲]

Detailed reference viewed: 149 (40 ULg)
Full Text
Peer Reviewed
See detailA relaxed approach to combinatorial problems in robustness and diagnostics
Critchley, Frank; Schyns, Michael ULg; Haesbroeck, Gentiane ULg et al

in Statistics and Computing (2010), 20(1), 99-115

A range of procedures in both robustness and diagnostics require optimisation of a target functional over all subsamples of given size. Whereas such combinatorial problems are extremely difficult to solve ... [more ▼]

A range of procedures in both robustness and diagnostics require optimisation of a target functional over all subsamples of given size. Whereas such combinatorial problems are extremely difficult to solve exactly, something less than the global optimum can be ‘good enough’ for many practical purposes, as shown by example. Again, a relaxation strategy embeds these discrete, high-dimensional problems in continuous, low-dimensional ones. Overall, nonlinear optimisation methods can be exploited to provide a single, reasonably fast algorithm to handle a wide variety of problems of this kind, thereby providing a certain unity. Four running examples illustrate the approach. On the robustness side, algorithmic approximations to minimum covariance determinant (MCD) and least trimmed squares (LTS) estimation. And, on the diagnostic side, detection of multiple multivariate outliers and global diagnostic use of the likelihood displacement function. This last is developed here as a global complement to Cook’s (in J. R. Stat. Soc. 48:133–169, 1986) local analysis. Appropriate convergence of each branch of the algorithm is guaranteed for any target functional whose relaxed form is—in a natural generalisation of concavity, introduced here—‘gravitational’. Again, its descent strategy can downweight to zero contaminating cases in the starting position. A simulation study shows that, although not optimised for the LTS problem, our general algorithm holds its own with algorithms that are so optimised. An adapted algorithm relaxes the gravitational condition itself. [less ▲]

Detailed reference viewed: 146 (52 ULg)
Full Text
See detailDetection of influential observations on the error rate based on the generalized k-means clustering procedure
Ruwet, Christel ULg; Haesbroeck, Gentiane ULg

Conference (2009, October 14)

Cluster analysis may be performed when one wishes to group similar objects into a given number of clusters. Several algorithms are available in order to construct these clusters. In this talk, focus will ... [more ▼]

Cluster analysis may be performed when one wishes to group similar objects into a given number of clusters. Several algorithms are available in order to construct these clusters. In this talk, focus will be on the generalized k-means algorithm, while the data of interest are assumed to come from an underlying population consisting of a mixture of two groups. Among the outputs of this clustering technique, a classi cation rule is provided in order to classify the objects into one of the clusters. When classi cation is the main objective of the statistical analysis, performance is often measured by means of an error rate ER(F; Fm) where F is the distribution of the training sample used to set up the classi cation rule and Fm (model distribution) is the distribution under which the quality of the rule is assessed (via a test sample). Under contamination, one has to replace the distribution F of the training sample by a contaminated one, F(eps) say (where eps corresponds to the fraction of contamination). In that case, the error rate will be corrupted since it relies on a contaminated rule, while the test sample may still be considered as being distributed according to the model distribution. To measure the robustness of classification based on this clustering proce- dure, influence functions of the error rate may be computed. The idea has already been exploited by Croux et al. (2008) and Croux et al. (2008) in the context of linear and logistic discrimination. In this setup, the contaminated distribution takes the form F(eps)= (1-eps)*Fm + eps*Dx, where Dx is the Dirac distribution putting all its mass at x: After studying the influence function of the error rate of the generalized k- means procedure, which depends on the influence functions of the generalized k-means centers derived by Garcia-Escudero and Gordaliza (1999), a diagnostic tool based on its value will be presented. The aim is to detect observations in the training sample which can be influential for the error rate. [less ▲]

Detailed reference viewed: 74 (32 ULg)
Full Text
Peer Reviewed
See detailOutlier detection with the minimum covariance determinant estimator in practice
Fauconnier, Cécile ULg; Haesbroeck, Gentiane ULg

in Statistical Methodology (2009), 6(4), 363-379

Robust statistics has slowly become familiar to all practitioners. Books entirely devoted to the subject are without any doubts responsible for the increased practice of robust statistics in all fields of ... [more ▼]

Robust statistics has slowly become familiar to all practitioners. Books entirely devoted to the subject are without any doubts responsible for the increased practice of robust statistics in all fields of applications. Even classical books often have at least one chapter (or parts of chapters) which develops robust methodology. The improvement of computing power has also contributed to the development of a wider and wider range of available robust procedures. However, this success story is now menacing to get backwards: non specialists interested in the application of robust methodology are faced with a large set of (assumed equivalent) methods and with over-sophistication of some of them. Which method should one use? How the (numerous) parameters should be optimaly tuned? These questions are not so easy to answer for non specialists! One could then argue that default procedures are available in most statistical softwares (Splus, R, SAS, Matlab,...). However, using as illustration the detection of outliers in multivariate data, it is shown that, on one hand, it is not obvious that one would feel confident with the output of default procedures, and that, on the other hand, trying to understand thoroughly the tuning parameters involved in the procedures might require some extensive research. This is not conceivable when trying to compete with the classical methodology which (while clearly unreliable) is so straightfoward. The aim of the paper is to help the practitioners willing to detect in a reliable way outliers in a multivariate data set. The chosen methodology is the Minimum Covariance Determinant estimator being widely available and intuitively appealing. [less ▲]

Detailed reference viewed: 250 (13 ULg)
Full Text
See detailImpact of contamination on empirical and theoretical error
Ruwet, Christel ULg; Haesbroeck, Gentiane ULg

Conference (2009, June 18)

Classification analysis allows to group similar objects into a given number of groups by means of a classification rule. Many classification procedures are available : linear discrimination, logistic ... [more ▼]

Classification analysis allows to group similar objects into a given number of groups by means of a classification rule. Many classification procedures are available : linear discrimination, logistic discrimination, etc. Focus in this poster will be on classification resulting from a clustering analysis. Indeed, among the outputs of classical clustering techniques, a classification rule is provided in order to classify the objects into one of the clusters. More precisely, let F denote the underlying distribution and assume that the generalized kmeans algorithm with penalty function is used to construct the k clusters C1(F), . . . ,Ck(F) with centers T1(F), . . . , Tk(F). When one feels that k true groups are existing among the data, classification might be the main objective of the statistical analysis. Performance of a particular classification technique can be measured by means of an error rate. Depending on the availability of data, two types of error rates may be computed: a theoretical one and a more empirical one. In the first case, the rule is estimated on a training sample with distribution F while the evaluation of the classification performance may be done through a test sample distributed according to a model distribution of interest, Fm say. In the second case, the same data are used to set up the rule and to evaluate the performance. Under contamination, one has to replace the distribution F of the training sample by a contaminated one, F(eps) say (where eps corresponds to the fraction of contamination). In that case, thetheoretical error rate will be corrupted since it relies on a contaminated rule but it may still consider a test sample distributed according to the model distribution. The empirical error rate will be affected twice: via the rule and also via the sample used for the evaluation of the classification performance. To measure the robustness of classification based on clustering, influence functions of the error rate may be computed. The idea has already been exploited by Croux et al (2008) and Croux et al (2008) in the context of linear and logistic discrimination. In the computation of influence functions, the contaminated distribution takes the form F(eps) = (1 − eps)*Fm + eps* Dx, where Dx is the Dirac distribution putting all its mass at x. It is interesting to note that the impact of the point mass x may be positive, i.e. may decrease the error rate, when the data at hand is used to evaluate the error. [less ▲]

Detailed reference viewed: 58 (23 ULg)