Density-based clustering for data containing two types of points

被引:50
作者
Pei, Tao [1 ]
Wang, Weiyi [1 ]
Zhang, Hengcai [1 ]
Ma, Ting [1 ]
Du, Yunyan [1 ]
Zhou, Chenghu [1 ]
机构
[1] Chinese Acad Sci, Inst Geog Sci & Nat Resources Res, State Key Lab Resources & Environm Informat Syst, Beijing 100101, Peoples R China
基金
中国国家自然科学基金;
关键词
density-based cluster; kth nearest distance; Origins and destinations of taxicab trip; ALGORITHM; PATTERNS; DOMAIN;
D O I
10.1080/13658816.2014.955027
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
When only one type of point is distributed in a region, clustered points can be seen as an anomaly. When two different types of points coexist in a region, they overlap at different places with various densities. In such cases, the meaning of a cluster of one type of point may be altered if points of the other type show different densities within the same cluster. If we consider the origins and destinations (OD) of taxicab trips, the clustering of both in the morning may indicate a transportation hub, whereas clustered origins and sparse destinations (a hot spot where taxis are in short supply) could suggest a densely populated residential area. This cannot be identified by previous clustering methods, so it is worthwhile studying a clustering method for two types of points. The concept of two-component clustering is first defined in this paper as a group containing two types of points, at least one of which exhibits clustering. We then propose a density-based method for identifying two-component clusters. The method is divided into four steps. The first estimates the clustering scale of the point data. The second transforms the point data into the 2D density domain, where the x and y axes represent the local density of each type of point around each point, respectively. The third determines the thresholds for extracting the clusters, and the fourth generates two-component clusters using a density-connectivity mechanism. The method is applied to taxicab trip data in Beijing. Three types of two-component clusters are identified: high-density origins and destinations, high-density origins and low-density destinations, and low-density origins and high-density destinations. The clustering results are verified by the spatial relationship between the cluster locations and their land-use types over different periods of the day.
引用
收藏
页码:175 / 193
页数:19
相关论文
共 23 条
[1]  
Ashour W, 2011, LECT NOTES COMPUT SC, V6936, P446, DOI 10.1007/978-3-642-23878-9_53
[2]  
Besag J., 1977, J. Roy. Stat. Soc. B, V39, P193, DOI [DOI 10.1111/J.2517-6161.1977.TB01616.X, 10.1111/j.2517-6161.1977.tb01616.x]
[3]   Looking for natural patterns in data - Part 1. Density-based approach [J].
Daszykowski, M ;
Walczak, B ;
Massart, DL .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2001, 56 (02) :83-92
[4]   A simulated annealing strategy for the detection of arbitrarily shaped spatial clusters [J].
Duczmal, L ;
Assunçao, R .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2004, 45 (02) :269-286
[5]   Discovering Spatial Patterns in Origin-Destination Mobility Data [J].
Guo, Diansheng ;
Zhu, Xi ;
Jin, Hai ;
Gao, Peng ;
Andris, Clio .
TRANSACTIONS IN GIS, 2012, 16 (03) :411-429
[6]   Random walks to identify anomalous free-form spatial scan windows [J].
Janeja, Vandana P. ;
Atluri, Vijayalakshmi .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (10) :1378-1392
[7]   A new hybrid method based on partitioning-based DBSCAN and ant clustering [J].
Jiang, Hua ;
Li, Jing ;
Yi, Shenghe ;
Wang, Xiangyang ;
Hu, Xin .
EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (08) :9373-9381
[8]   On the Use of Ripley's K-Function and Its Derivatives to Analyze Domain Size [J].
Kiskowski, Maria A. ;
Hancock, John F. ;
Kenworthy, Anne K. .
BIOPHYSICAL JOURNAL, 2009, 97 (04) :1095-1103
[9]   A spatial scan statistic [J].
Kulldorff, M .
COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 1997, 26 (06) :1481-1496
[10]   An elliptic spatial scan statistic [J].
Kulldorff, Martin ;
Huang, Lan ;
Pickle, Linda ;
Duczmal, Luiz .
STATISTICS IN MEDICINE, 2006, 25 (22) :3929-3943