IMine: Index Support for Item Set Mining

被引:14
作者
Baralis, Elena [1 ]
Cerquitelli, Tania [1 ]
Chiusano, Silvia [1 ]
机构
[1] Politecn Torino, Dipartimento Automat & Informat, I-10129 Turin, Italy
关键词
Data mining; item set extraction; indexing;
D O I
10.1109/TKDE.2008.180
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents the IMine index, a general and compact structure which provides tight integration of item set extraction in a relational DBMS. Since no constraint is enforced during the index creation phase, IMine provides a complete representation of the original database. To reduce the I/O cost, data accessed together during the same extraction phase are clustered on the same disk block. The IMine index structure can be efficiently exploited by different item set extraction algorithms. In particular, IMine data access methods currently support the FP-growth and LCM v. 2 algorithms, but they can straightforwardly support the enforcement of various constraint categories. The IMine index has been integrated into the PostgreSQL DBMS and exploits its physical level access methods. Experiments, run for both sparse and dense data distributions, show the efficiency of the proposed index and its linear scalability also for large data sets. Item set mining supported by the IMine index shows performance always comparable with, and often (especially for low supports) better than, state-of-the-art algorithms accessing data on flat file.
引用
收藏
页码:493 / 506
页数:14
相关论文
共 33 条
[1]  
Agrawal N., 1993, IEEE T KNOWLEDGE DAT, V5
[2]  
Agrawal R., 1994, P INT C VER LARG DAT, P487
[3]  
AGRAWAL R, 1993, P ACM SIGMOD 93 MAY
[4]  
AGRAWAL R, 1996, P 2 INT C KNOWL DISC
[5]  
BARALIS E, 2005, P 21 INT C DAT ENG I
[6]  
Bayer R., 1972, Acta Informatica, V1, P173, DOI 10.1007/BF00288683
[7]  
BOTTA M, 2002, P 4 INT C DAT WAR KN
[8]   The hybrid tree: An index structure for high dimensional feature spaces [J].
Chakrabarti, K ;
Mehrotra, S .
15TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 1999, :440-447
[9]  
CHAUDHURI S, 2002, P 18 INT C DAT ENG I
[10]   Mining frequent itemsets without support threshold: With and without item constraints [J].
Cheung, YL ;
Fu, AWC .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2004, 16 (09) :1052-1069