A proposal for robust curve clustering

被引:82
作者
García-Escudero, LA [1 ]
Gordaliza, A [1 ]
机构
[1] Univ Valladolid, Dept Estadist & Invest Operat, E-47002 Valladolid, Spain
关键词
functional data; clustering; k-means; trimmed k-means; robustness;
D O I
10.1007/s00357-005-0013-8
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Functional data sets appear in many areas of science. Although each data point may be seen as a large finite-dimensional vector it is preferable to think of them as functions, and many classical multivariate techniques have been generalized for this kind of data. A widely used technique for dealing with functional data is to choose a finite-dimensional basis and find the best projection of each curve onto this basis. Therefore, given a functional basis, an approach for doing curve clustering relies on applying the k-means methodology to the fitted basis coefficients corresponding to all the curves in the data set. Unfortunately, a serious drawback follows from the lack of robustness of k-means. Trimmed k-means clustering (Cuesta-Albertos, Gordaliza, and Matran 1997) provides a robust alternative to the use of k-means and, consequently, it may be successfully used in this functional framework. The proposed approach will be exemplified by considering cubic B-splines bases, but other bases can be applied analogously depending on the application at hand.
引用
收藏
页码:185 / 201
页数:17
相关论文
共 30 条
[1]   Unsupervised curve clustering using B-splines [J].
Abraham, C ;
Cornillon, PA ;
Matzner-Lober, E ;
Molinari, N .
SCANDINAVIAN JOURNAL OF STATISTICS, 2003, 30 (03) :581-595
[2]  
Cadez I. V., 2000, Proceedings. KDD-2000. Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P140, DOI 10.1145/347090.347119
[3]  
CRELLIN N, 1997, COMPUTING SCI STAT I
[4]   THE STRONG LAW OF LARGE NUMBERS FOR K-MEANS AND BEST POSSIBLE NETS OF BANACH VALUED RANDOM-VARIABLES [J].
CUESTA, JA ;
MATRAN, C .
PROBABILITY THEORY AND RELATED FIELDS, 1988, 78 (04) :523-534
[5]  
Cuesta-Albertos JA, 1997, ANN STAT, V25, P553
[6]  
CUESTAALBERTOS JA, 2005, IN PRESS AM MATH SOC
[7]  
De Boor C., 1978, PRACTICAL GUIDE SPLI, DOI DOI 10.1007/978-1-4612-6333-3
[8]  
DESOETE G, 1993, INFORM CLASSIFICATIO
[9]  
Eubank R.L., 1988, SPLINE SMOOTHING NON
[10]   REPRESENTING A LARGE COLLECTION OF CURVES - A CASE FOR PRINCIPAL POINTS [J].
FLURY, BD ;
TARPEY, T .
AMERICAN STATISTICIAN, 1993, 47 (04) :304-306