|Reference : Impact of contamination on training and test error rates in statistical clustering|
|Scientific journals : Article|
|Physical, chemical, mathematical & earth Sciences : Mathematics|
|Impact of contamination on training and test error rates in statistical clustering|
|Ruwet, Christel [Université de Liège - ULg > Département de mathématique > Statistique (aspects théoriques) >]|
|Haesbroeck, Gentiane [Université de Liège - ULg > Département de mathématique > Statistique (aspects théoriques) >]|
|Communications in Statistics : Simulation & Computation|
|Taylor & Francis Ltd|
|[en] Clustering analysis ; Error rate ; Generalized k-means ; Influence Function ; Principal points ; Robustness|
|[en] The k-means algorithm is one of the most common nonhierarchical methods of clustering. It aims to construct clusters in order to minimize the within cluster sum of squared distances. However, as most estimators defined in terms of objective functions depending on global sums of squares, the k-means procedure is not robust with respect to atypical observations in the data. Alternative techniques have thus been introduced in the literature, e.g. the k-medoids method. The k-means and k-medoids methodologies are particular cases of the generalized k-means procedure. In this paper, focus is on the error rate these clustering procedures achieve when one expects the data to be distributed according to a mixture distribution. Two different definitions of the error rate are under consideration, depending on the data at hand. It is
shown that contamination may make one of these two error rates decrease even under optimal models. The consequence of this will be emphasized with the comparison of influence functions and breakdown points of these error rates.
|(c) Taylor and Francis Group, 2011.
This is the author's version of the work. It is posted here by permission of Taylor and Francis Group for personal use, not for redistribution.
The definitive version was published in Communications in Statistics - Simulation and Computation, Volume 40 Issue 3, March 2011.
|File(s) associated to this reference|
All documents in ORBi are protected by a user license.