Iterative structure discovery in graph-based data

被引:7
作者
Coble, JA [1 ]
Rathi, R [1 ]
Cook, DJ [1 ]
Holder, LB [1 ]
机构
[1] Univ Texas, Dept Comp Sci & Engn, Arlington, TX 76019 USA
关键词
structural data mining; incremental discovery; graph partitioning; machine learning;
D O I
10.1142/S0218213005002016
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Much of current data mining research is focused on discovering sets of attributes that discriminate data entities into classes, such as shopping trends for a particular demographic group. In contrast, we are working to develop data mining techniques to discover patterns consisting of complex relationships between entities. Our research is particularly applicable to domains in which the data is event-driven or relationally structured. In this paper we present approaches to address two related challenges; the need to assimilate incremental data updates and the need to mine monolithic datasets. Many realistic problems are continuous in nature and therefore require a data mining approach that can evolve discovered knowledge over time. Similarly, many problems present data sets that are too large to fit into dynamic memory on conventional computer systems. We address incremental data mining by introducing a mechanism for summarizing discoveries from previous data increments so that the globally-best patterns can be computed by mining only the new data increment. To address monolithic datasets we introduce a technique by which these datasets can be partitioned and mined serially with minimal impact on the result quality. We present applications of our work in both the counter-terrorism and bioinformatics domains.
引用
收藏
页码:101 / 124
页数:24
相关论文
共 19 条
[1]  
AGRAWAL R, 1995, P 1 INT C KNOWL DISC
[2]  
AGRAWAL R, 1995, P INT C DAT ENG TAIP
[3]  
[Anonymous], 1997, CVPR
[4]  
BLUM A, 1996, P WORKSH ON LIN ALG
[5]   Substructure Discovery Using Minimum Description Length and Background Knowledge [J].
Cook, Diane J. ;
Holder, Lawrence B. .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 1993, 1 :231-255
[6]   Approaches to parallel graph-based knowledge discovery [J].
Cook, DJ ;
Holder, LB ;
Galal, G ;
Maglothin, R .
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2001, 61 (03) :427-446
[7]  
FIDUCCIA CM, 1982, LINEAR TIME HEURISTI, P00175
[8]  
Han J, 2000, ACM SIGMOD INT C MAN
[9]  
HOLDER L, 2002, PATTERN RECOGNITION
[10]  
Hulten G., 2001, KDD 01