Ruwet, Christel[Université de Liège - ULg > Département de mathématique > Statistique mathématique >]
Seminario del departamento de Estadística e Inverstigación operativa de la UVa
[en] The TCLUST procedure is a new robust clustering method introduced by García-Escudero et al. (2008). It performs clustering with the aim of finding clusters with different scatters and weights. As the corresponding objective function can be unbounded, a restriction is added on the eigenvalues-ratio of the scatter matrices. The robustness of the method is guaranteed by allowing the trimming of a given proportion of observations. This trimming level has to be chosen by the practitioner, as well as the number of clusters. Suitable values for these parameters can be obtained throughout the careful examination of some classification trimmed likelihood curves (García-Escudero et al., 2010). The first part of this talk will consist of a brief presentation of this clustering procedure and the related R package (tclust).
In the second part of the talk, the robustness of the TCLUST procedure, and more precisely its breakdown behavior, will be studied. We will see that the estimator of the scatter matrices can resist to more outliers than the number of trimmed observations. However, the brekdown point of estimator of the centers is very poor. Two observations are sufficient to make the centers break down. This is due to the stringency of the classical breakdown point; the estimator has to have a good behavior even on samples which can hardly be clustered. For this reason, Gallegos and Ritter (2005) introduced the restricted breakdown point. The idea is to restrict the analysis to the class of “well-separated” data sets. On this class, the estimator of the centers has a breakdown point of α, the level of trimming.