Robust distance-based clustering with applications to spatial data mining

被引:23
作者
Estivill-Castro, V [1 ]
Houle, ME
机构
[1] Univ Newcastle, Dept Comp Sci & Software Engn, Callaghan, NSW 2308, Australia
[2] Univ Sydney, Basser Dept Comp Sci, Sydney, NSW 2006, Australia
关键词
clustering; spatial databases; data mining; exploratory data analysis; delaunay triangulation; combinatorial optimization; p-median problem; medoids; noise;
D O I
10.1007/s00453-001-0010-1
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In this paper we present a method for clustering gee-referenced data suitable for applications in spatial data mining, based on the medoid method. The medoid method is related to k-MEANS, With the restriction that cluster representatives be chosen from among the data elements. Although the medoid method in general produces clusters of high quality, especially in the presence of noise, it is often criticized for the Omega (n(2)) time that it requires. Our method incorporates both proximity and density information to achieve high-quality clusters in sub-quadratic time; it does not require that the user specify the number of clusters in advance. The time bound is achieved by means of a fast approximation to the medoid objective function, using Delaunay triangulations to store proximity information.
引用
收藏
页码:216 / 242
页数:27
相关论文
共 93 条
  • [1] Agarwal P. K., 1997, Proceedings of the Thirteenth Annual Symposium on Computational Geometry, P147, DOI 10.1145/262839.262921
  • [2] Aldenderfer M., 1984, Cluster Analysis, DOI DOI 10.4135/9781412983648
  • [3] Ankerst M., 1999, SIGMOD Record, V28, P49, DOI 10.1145/304181.304187
  • [4] [Anonymous], SPATIAL ANAL GIS
  • [5] [Anonymous], 1997, SIGMOD WORKSH RES IS
  • [6] [Anonymous], 1994, COMPUTATIONAL GEOMET
  • [7] [Anonymous], 1989, The Design and Analysis of Spatial Data Structures
  • [8] ARONOFF S, 1993, GEOGRAPHIC INFORMATI
  • [9] Arora S., 1998, P 30 ANN ACM S THEOR
  • [10] Berry MichaelJ., 1997, DATA MINING TECHNIQU