Mining border descriptions of emerging patterns from dataset pairs

被引:77
作者
Dong, GZ [1 ]
Li, JY
机构
[1] Wright State Univ, Dept CSE, Dayton, OH 45435 USA
[2] Inst Infocomm Res, Singapore, Singapore
关键词
border algorithms; border descriptions; changes; classification rules; contrasts; differences; emerging patterns; minimal/maximal patterns; trends;
D O I
10.1007/s10115-004-0178-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The mining of changes or differences or other comparative patterns from a pair of datasets is an interesting problem. This paper is focused on the mining of one type of comparative pattern called emerging patterns. Emerging patterns are denoted by EPs and are defined as patterns for which support increases from one dataset to the other with a big ratio. The number of EPs is sometimes huge. To provide a good structure for and to reduce the size of mining results, we use borders to concisely describe large collections of EPs in a lossless way. Such a border consists of only the minimal (under set inclusion) and the maximal EPs in the collection. We also present an algorithm for efficiently computing the borders of some desired EPs by manipulating the input borders only. Our experience with many datasets in the UCI Repository and recent cancer diagnosis datasets demonstrated that: Both the EP pattern type and our algorithm are useful for building accurate classifiers and useful for mining multifactor interactions, for example, minimal gene groups potentially responsible for the development of cancer.
引用
收藏
页码:178 / 202
页数:25
相关论文
共 44 条
[1]  
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[2]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[3]  
[Anonymous], P 1998 ACM SIGMOD IN
[4]  
[Anonymous], P 2001 INT C VER LAR
[5]  
[Anonymous], ADV KNOWLEDGE DISCOV
[6]  
[Anonymous], 1993, P 13 INT JOINT C ART
[7]  
[Anonymous], KNOWL INF SYST
[8]  
BAILEY J, 2002, P PKDD
[9]  
BAY SD, 2001, DATA MIN KNOWL DISCO
[10]  
Breiman L., 1998, CLASSIFICATION REGRE