Clustering large graphs via the Singular Value Decomposition

被引:289
作者
Drineas, P [1 ]
Frieze, A
Kannan, R
Vempala, S
Vinay, V
机构
[1] Rensselaer Polytech Inst, Dept Comp Sci, Troy, NY 12180 USA
[2] Carnegie Mellon Univ, Dept Math Sci, Pittsburgh, PA 15213 USA
[3] Yale Univ, Dept Comp Sci, New Haven, CT 06520 USA
[4] MIT, Dept Math, Cambridge, MA 02139 USA
[5] Indian Inst Sci, Bangalore 560012, Karnataka, India
基金
美国国家科学基金会;
关键词
Singular Value Decomposition; randomized algorithms; k-means clustering;
D O I
10.1023/B:MACH.0000033113.59016.96
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider the problem of partitioning a set of m points in the n-dimensional Euclidean space into k clusters ( usually m and n are variable, while k is fixed), so as to minimize the sum of squared distances between each point and its cluster center. This formulation is usually the objective of the k-means clustering algorithm (Kanungo et al. ( 2000)). We prove that this problem in NP-hard even for k = 2, and we consider a continuous relaxation of this discrete problem: find the k-dimensional subspace V that minimizes the sum of squared distances to V of the m points. This relaxation can be solved by computing the Singular Value Decomposition (SVD) of the m x n matrix A that represents the m points; this solution can be used to get a 2-approximation algorithm for the original problem. We then argue that in fact the relaxation provides a generalized clustering which is useful in its own right. Finally, we show that the SVD of a random submatrix-chosen according to a suitable probability distribution of a given matrix provides an approximation to the SVD of the whole matrix, thus yielding a very fast randomized algorithm. We expect this algorithm to be the main contribution of this paper, since it can be applied to problems of very large size which typically arise in modern applications.
引用
收藏
页码:9 / 33
页数:25
相关论文
共 27 条
  • [1] Achlioptas D., 2001, P 33 ANN ACM S THEOR, P337, DOI 10.1145/380752.380820
  • [2] ACHLIOPTAS D, 2003, UNPUB FAST COMPUTATI
  • [3] SINGULAR VALUE DECOMPOSITION (SVD) IMAGE-CODING
    ANDREWS, HC
    PATTERSON, CL
    [J]. IEEE TRANSACTIONS ON COMMUNICATIONS, 1976, 24 (04) : 425 - 432
  • [4] SINGULAR VALUE DECOMPOSITIONS AND DIGITAL IMAGE-PROCESSING
    ANDREWS, HC
    PATTERSON, CL
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1976, 24 (01): : 26 - 53
  • [5] [Anonymous], 2002, THESIS U CALIFORNIA
  • [6] [Anonymous], 1999, 40 ANN S FDN COMP SC
  • [7] Azar Y., 2001, P 33 ANN ACM S THEOR, P619, DOI [10.1145/380752.380859, DOI 10.1145/380752.380859]
  • [8] Bar-Yossef Z., 2003, Proceedings of the 35th Annual ACM Symposium on the Theory of Computing, P335, DOI DOI 10.1145/780542.780593
  • [9] Berry MichaelJ., 1997, DATA MINING TECHNIQU
  • [10] A constant-factor approximation algorithm for the k-median problem
    Charikar, M
    Guha, S
    Tardos, E
    Shmoys, DB
    [J]. JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2002, 65 (01) : 129 - 149