Trimming tools in exploratory data analysis

被引:52
作者
García-Escudero, LA [1 ]
Gordaliza, A [1 ]
Matrán, C [1 ]
机构
[1] Univ Valladolid, Fac Ciencias, Dept Estadist & Invest Operat, E-47005 Valladolid, Spain
关键词
cluster analysis; k-means; outlier; robustness; trimmed k-means;
D O I
10.1198/1061860031806
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Exploratory graphical tools based on trimming are proposed for detecting main clusters in a given dataset. The trimming is obtained by resorting to trimmed k-means methodology. The analysis always reduces to the examination of real valued curves, even in the multivariate case. As the technique is based on a robust clustering criterium, it is able to handle the presence of different kinds of outliers. An algorithm is proposed to carry out this (computer intensive) method. As with classical k-means, the method is specially oriented to mixtures of spherical distributions. A possible generalization is outlined to overcome this drawback.
引用
收藏
页码:434 / 449
页数:16
相关论文
共 21 条
  • [1] Ankerst M., 1999, SIGMOD Record, V28, P49, DOI 10.1145/304181.304187
  • [2] [Anonymous], 1988, Multivariate statistics: A practical approach
  • [3] AZZALINI A, 1990, J R STAT SOC C-APPL, V39, P357
  • [4] COOK D, 1999, P ISI 1999 HELS, P103
  • [5] Cuesta-Albertos JA, 1997, ANN STAT, V25, P553
  • [6] PERCENTAGE POINTS OF A TEST FOR CLUSTERS
    ENGELMAN, L
    HARTIGAN, JA
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1969, 64 (328) : 1647 - &
  • [7] Robustness properties of k means and trimmed k means
    García-Escudero, LA
    Gordaliza, A
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1999, 94 (447) : 956 - 969
  • [8] Asymptotics for trimmed k-means and associated tolerance zones
    García-Escudero, LA
    Gordaliza, A
    Matrán, C
    [J]. JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 1999, 77 (02) : 247 - 262
  • [9] García-Escudero LA, 1999, ANN STAT, V27, P1061
  • [10] GOOD IJ, 1980, J AM STAT ASSOC, V75, P42, DOI 10.2307/2287377