No full text
Unpublished conference/Abstract (Scientific congresses and symposiums)
Detection of influential observations on the error rate based on the generalized k-means clustering procedure
Ruwet, Christel; Haesbroeck, Gentiane
200917th Annual meeting of the Belgian Statistical Society
 

Files


Full Text
No document available.
Annexes
PresentationSBS2009-RUWET.pdf
Publisher postprint (1.72 MB)
Download

All documents in ORBi are protected by a user license.

Send to



Details



Abstract :
[en] Cluster analysis may be performed when one wishes to group similar objects into a given number of clusters. Several algorithms are available in order to construct these clusters. In this talk, focus will be on the generalized k-means algorithm, while the data of interest are assumed to come from an underlying population consisting of a mixture of two groups. Among the outputs of this clustering technique, a classi cation rule is provided in order to classify the objects into one of the clusters. When classi cation is the main objective of the statistical analysis, performance is often measured by means of an error rate ER(F; Fm) where F is the distribution of the training sample used to set up the classi cation rule and Fm (model distribution) is the distribution under which the quality of the rule is assessed (via a test sample). Under contamination, one has to replace the distribution F of the training sample by a contaminated one, F(eps) say (where eps corresponds to the fraction of contamination). In that case, the error rate will be corrupted since it relies on a contaminated rule, while the test sample may still be considered as being distributed according to the model distribution. To measure the robustness of classification based on this clustering proce- dure, influence functions of the error rate may be computed. The idea has already been exploited by Croux et al. (2008) and Croux et al. (2008) in the context of linear and logistic discrimination. In this setup, the contaminated distribution takes the form F(eps)= (1-eps)*Fm + eps*Dx, where Dx is the Dirac distribution putting all its mass at x: After studying the influence function of the error rate of the generalized k- means procedure, which depends on the influence functions of the generalized k-means centers derived by Garcia-Escudero and Gordaliza (1999), a diagnostic tool based on its value will be presented. The aim is to detect observations in the training sample which can be influential for the error rate.
Disciplines :
Mathematics
Author, co-author :
Ruwet, Christel ;  Université de Liège - ULiège > Département de mathématique > Statistique (aspects théoriques)
Haesbroeck, Gentiane ;  Université de Liège - ULiège > Département de mathématique > Statistique (aspects théoriques)
Language :
English
Title :
Detection of influential observations on the error rate based on the generalized k-means clustering procedure
Publication date :
14 October 2009
Event name :
17th Annual meeting of the Belgian Statistical Society
Event place :
Lommel, Belgium
Event date :
du 14 octobre 2009 au 16 octobre 2009
Available on ORBi :
since 04 November 2009

Statistics


Number of views
115 (35 by ULiège)
Number of downloads
45 (12 by ULiège)

Bibliography


Similar publications



Contact ORBi