Consistency of spectral clustering

被引:331
作者
von Luxburg, Ulrike [1 ]
Belkin, Mikhail [2 ]
Bousquet, Olivier [3 ]
机构
[1] Max Planck Inst Biol Cybernet, D-72076 Tubingen, Germany
[2] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[3] Pertinence, F-75002 Paris, France
关键词
spectral clustering; graph Laplacian; consistency; convergence of eigenvectors;
D O I
10.1214/009053607000000640
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Consistency is a key property of all statistical procedures analyzing randomly sampled data. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of the popular family of spectral clustering algorithms, which clusters the data with the help of eigenvectors of graph Laplacian matrices. We develop new methods to establish that, for increasing sample size, those eigenvectors converge to the eigenvectors of certain limit operators. As a result, we can prove that one of the two major classes of spectral clustering (normalized clustering) converges under very general conditions, while the other (unnormalized clustering) is only consistent under strong additional assumptions, which are not always satisfied in real data. We conclude that our analysis provides strong evidence for the superiority of normalized spectral clustering.
引用
收藏
页码:555 / 586
页数:32
相关论文
共 51 条
[1]  
ABDULLAEV Z, 1991, SPECTRA SCATTERING A, V5, P1
[2]  
Alavi Y., 1991, Graph theory, combinatorics, and applications, V2, P871
[3]  
ALPERT CJ, 1995, DES AUT CON, P195
[4]  
[Anonymous], 1980, FUNCTIONAL ANAL
[5]  
Anselone P. M., 1971, Collectively compact operator approximation theory and applications to integral equations
[6]  
Anthony M., 2002, LSECDAM200207
[8]  
Bai ZD, 1999, STAT SINICA, V9, P611
[9]  
BAKER C, 1977, NUMERICAL TREATMENT
[10]   A SPECTRAL ALGORITHM FOR ENVELOPE REDUCTION OF SPARSE MATRICES [J].
BARNARD, ST ;
POTHEN, A ;
SIMON, H .
NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS, 1995, 2 (04) :317-334