Robustness properties of k means and trimmed k means

被引：126

作者：

García-Escudero, LA ^{[1
]}

Gordaliza, A ^{[1
]}

机构：

[1] Univ Valladolid, Fac Ciencias, Dept Estadistica & Invest Operat, E-47002 Valladolid, Spain

来源：

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION | 1999年 / 94卷 / 447期

关键词：

breakdown point; cluster analysis; influence function; qualitative robustness;

D O I：

10.2307/2670010

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

The generalized k means method is based on the minimization of the discrepancy between a random variable (or a sample of this random variable) and a set with ii points measured through a penalty function Phi. As in the M estimators setting (k = 1), a penalty function, Phi, with unbounded derivative, Psi, naturally leads to nonrobust generalized k means. However, surprisingly the lack of robustness extends also to the case of bounded Psi; that is, generalized k means do not inherit the robustness properties of the M estimator from which they came. Attempting to robustify the generalized k means method, the generalized trimmed ic means method arises from combining fi means idea with a so-called impartial trimming procedure. In this article study generalized k means and generalized trimmed k means performance from the viewpoint of Hampel's robustness criteria; that is, we investigate the influence function, breakdown point, and qualitative robustness, confirming the superiority provided by the trimming. We include the study of two real datasets to make clear the robustness of generalized trimmed k means.

引用

页码：956 / 969

页数：14

共 29 条

[1] Billingsley P., 1986, PROBABILITY MEASURE
[2] GENERALIZED MEANS AND ASSOCIATED FAMILIES OF DISTRIBUTIONS
BRONS, HK
BRUNK, HD
FRANCK, WE
HANSON, DL
[J]. ANNALS OF MATHEMATICAL STATISTICS, 1969, 40 (02): : 339 - &
[3] NONPARAMETRIC INTERVAL AND POINT PREDICTION USING DATA TRIMMED BY A GRUBBS-TYPE OUTLIER RULE
BUTLER, RW
[J]. ANNALS OF STATISTICS, 1982, 10 (01) : 197 - 204
[4] NOTE ON GROUPING
COX, DR
[J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1957, 52 (280) : 543 - 547
[5] THE STRONG LAW OF LARGE NUMBERS FOR K-MEANS AND BEST POSSIBLE NETS OF BANACH VALUED RANDOM-VARIABLES
CUESTA, JA
MATRAN, C
[J]. PROBABILITY THEORY AND RELATED FIELDS, 1988, 78 (04) : 523 - 534
[6] Cuesta-Albertos JA, 1997, ANN STAT, V25, P553
[7] Donoho D. L., 1983, FESTSCHRIFT EL LEHMA
[8] ON GROUPING FOR MAXIMUM HOMOGENEITY
FISHER, WD
[J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1958, 53 (284) : 789 - 798
[9] Asymptotics for trimmed k-means and associated tolerance zones
García-Escudero, LA
Gordaliza, A
Matrán, C
[J]. JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 1999, 77 (02) : 247 - 262
[10] GARCIAESCUDERO LA, 1999, IN PRESS ANN STAT

← 1 2 3 →