An efficient k-means clustering algorithm:: Analysis and implementation

被引:3551
作者
Kanungo, T
Mount, DM
Netanyahu, NS
Piatko, CD
Silverman, R
Wu, AY
机构
[1] IBM Corp, Almaden Res Ctr, San Jose, CA 95120 USA
[2] Univ Maryland, Dept Comp Sci, College Pk, MD 20742 USA
[3] Bar Ilan Univ, Dept Math & Comp Sci, IL-52900 Ramat Gan, Israel
[4] Johns Hopkins Univ, Appl Phys Lab, Laurel, MD 20723 USA
[5] Univ Maryland, Ctr Automat Res, College Pk, MD 20742 USA
[6] American Univ, Dept Comp Sci & Informat Syst, Washington, DC 20016 USA
基金
美国国家科学基金会;
关键词
pattern recognition; machine learning; data mining; k-means clustering; nearest-neighbor searching; k-d tree; computational geometry; knowledge discovery;
D O I
10.1109/TPAMI.2002.1017616
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In k-means clustering, we are given a set of n data points in d-dimensional space R-d and an integer k and the problem is to determine a set of k points in R-d, called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for k-means clustering is Lloyd's algorithm. In this paper, we present a simple and efficient implementation of Lloyd's k-means clustering algorithm, which we call the filtering algorithm. This algorithm is easy to implement, requiring a kd-tree as the only major data structure. We establish the practical efficiency of the filtering algorithm in two ways. First, we present a data-sensitive analysis of the algorithm's running time, which shows that the algorithm runs faster as the separation between clusters increases. Second, we present a number of empirical studies both on synthetically generated data and on real data sets from applications in color quantization, data compression, and image segmentation.
引用
收藏
页码:881 / 892
页数:12
相关论文
共 50 条
  • [1] [Anonymous], 1990, COMPUTATIONAL GEOMET
  • [2] [Anonymous], 1998, EX APPR ALG CLUST SO
  • [3] [Anonymous], ELECT ENG COMPUTER S
  • [4] [Anonymous], CARTR937 U MAR
  • [5] [Anonymous], P CTR GEOM COMP 2 AN
  • [6] ARORA S, 1998, P 30 ANN ACM S THEOR, P106, DOI DOI 10.1145/276698.276718
  • [7] Approximate range searching
    Arya, S
    Mount, DM
    [J]. COMPUTATIONAL GEOMETRY-THEORY AND APPLICATIONS, 2000, 17 (3-4): : 135 - 152
  • [8] An optimal algorithm for approximate nearest neighbor searching in fixed dimensions
    Arya, S
    Mount, DM
    Netanyahu, NS
    Silverman, R
    Wu, AY
    [J]. JOURNAL OF THE ACM, 1998, 45 (06) : 891 - 923
  • [9] BALL GH, 1964, P INT C MICR CIRC TH
  • [10] Bayardo R.J., 1999, P 5 ACM SIGKDD INT C