CMAR: Accurate and efficient classification based on Multiple Class-Association Rules

被引:636
作者
Li, WM [1 ]
Han, JW [1 ]
Pei, J [1 ]
机构
[1] Simon Fraser Univ, Sch Comp Sci, Burnaby, BC V5A 1S6, Canada
来源
2001 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS | 2001年
关键词
D O I
10.1109/ICDM.2001.989541
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Previous studies propose that associative classification has high classification accuracy and strong flexibility, at handling unstructured data. However it still suffers from the huge set of mined rules and sometimes biased classification or overfitting since the classification is based on only single high-confidence rule. In this study, we propose a new associative classification method, CMAR, i.e., Classification based on Multiple Association Rules. The method extends an efficient frequent pattern mining method, FP-growth, constructs a class distribution-associated FP-tree, and mines large database efficiently. Moreover, it applies a CR-tree structure to store and retrieve mined association rules efficiently, and prunes rules effectively based on confidence, correlation and database coverage. The classification is performed based on a weighted chi(2) analysis using multiple strong association rules. Our extensive experiments on 26 databases from UCI machine learning database repository, show that CMAR is consistent, highly effective at classification of various kinds of databases and has better average classification accuracy in comparison with CBA and C4.5. Moreover, our performance study, shows that the method is highly efficient and scalable in comparison with other reported associative classification methods.
引用
收藏
页码:369 / 376
页数:8
相关论文
共 11 条
[1]  
AGRAWAL R, 1994, VLDB 94 CHIL SEPT
[2]  
Clark P., 1989, Machine Learning, V3, P261, DOI 10.1023/A:1022641700528
[3]  
DONG G, 1999, LNCS, V1721
[4]  
Han J., 2000, SIGMOD 00 DALL TX MA
[5]  
Hart P.E., 1973, Pattern recognition and scene analysis
[6]  
LENT B, 1997, ICDE 97 ENGL APR
[7]  
LI W, 2001, THESIS S FRASER U
[8]  
LIM TS, 2000, MACHINE LEARNING, V39
[9]  
LIU B, 1998, KDD 98 NEW YORK NY A
[10]  
Quinlan R, 1993, C4.5: Programs for Machine Learning