Parallel mining of outliers in large database

被引：37

作者：

Hung, E ^{[1
]}

Cheung, DW ^{[1
]}

机构：

[1] Univ Hong Kong, Dept Comp Sci & Informat Syst, Hong Kong, Hong Kong, Peoples R China

来源：

DISTRIBUTED AND PARALLEL DATABASES | 2002年 / 12卷 / 01期

关键词：

data mining; outlier detection; parallel algorithm;

D O I：

10.1023/A:1015608814486

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 [计算机科学与技术];

摘要：

Data mining is a new, important and fast growing database application. Outlier (exception) detection is one kind of data mining, which can be applied in a variety of areas like monitoring of credit card fraud and criminal activities in electronic commerce. With the ever-increasing size and attributes (dimensions) of database, previously proposed detection methods for two dimensions are no longer applicable. The time complexity of the Nested-Loop (NL) algorithm (Knorr and Ng, in Proc. 24th VLDB, 1998) is linear to the dimensionality but quadratic to the dataset size, inducing an unacceptable cost for large dataset. A more efficient version (ENL) and its parallel version (PENL) are introduced. In theory, the improvement of performance in PENL is linear to the number of processors, as shown in a performance comparison between ENL and PENL using Bulk Synchronization Parallel (BSP) model. The great improvement is further verified by experiments on a parallel computer system IBM 9076 SP2. The results show that it is a very good choice to mine outliers in a cluster of workstations with a low-cost interconnected by a commodity communication network.

引用

页码：5 / 26

页数：22

共 17 条

[1]

Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072

[2]

[Anonymous], P 9 INT DAT C IDC 99

[3]

[Anonymous], 1997, P CASCON

[4]

Barnett V., 1984, Outliers in Statistical Data, V2nd

[5]

BISSELING RH, 1993, 836 UTR U DEP MATH

[6]

BREUNIG MM, 2000, P ACM SIGMOD 2000 DA

[7]

Ester M, 1996, 2 INT C KNOWL DISCOV, P226, DOI DOI 10.5555/3001460.3001507

[8]

HAN JW, 1992, PROC INT CONF VERY L, P547

[9]

Hawkins D.M, 1980, IDENTIFICATION OUTLI, V11, DOI [10.1007/978-94-015-3994-4, DOI 10.1007/978-94-015-3994-4]

[10]

Knorr E. M., 1997, Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, P219

← 1 2 →