A dual approach to cluster discovery in point event data sets

被引:13
作者
Brimicombe, Allan J. [1 ]
机构
[1] Univ E London, Ctr Geoinformat Studies, London E16 2RD, England
关键词
spatial data mining; point events; clustering; hot spots; geocomputation; robust normalisation;
D O I
10.1016/j.compenvurbsys.2005.07.004
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Spatial data mining seeks to discover meaningful patterns in data where a prime dimension of interest is geographical location. Consideration of a spatial dimension becomes important where data either refer to specific locations and/or have significant spatial dependence which needs to be considered if meaningful patterns are to emerge. For point event data there are two main groups of approaches to identifying clusters. One stems from the statistical tradition of classification which assigns point events to a spatial segmentation. A popular method is the k-means algorithm. The other broad approach is one which searches for 'hot spots' which can be loosely defined as a localised excess of some incidence rate. Examples of this approach are GAM and kernel density estimation. This paper presents a novel variable resolution approach to 'hot spot' cluster discovery which acts to define spatial concentrations within the point event data. 'Hot spot' centroids are then used to establish additional distance variables and initial cluster centroids for a k-means classification that produces a segmentation, both spatially and by attribute. This dual approach is effective in quickly focusing on rational candidate solutions to the values of k and choice of initial candidate centroids in the k-means clustering. This is demonstrated through the analysis of a business transactions database. The overall dual approach can be used effectively to explore clusters in very large point event data sets. (C) 2006 Elsevier Ltd. All rights reserved.
引用
收藏
页码:4 / 18
页数:15
相关论文
共 52 条