Data organization and access for efficient data mining

被引:16
作者
Dunkel, B [1 ]
Soparkar, N [1 ]
机构
[1] Univ Michigan, Ann Arbor, MI 48109 USA
来源
15TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS | 1999年
关键词
D O I
10.1109/ICDE.1999.754968
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Efficient mining of data presents a significant challenge due to problems of combinatorial explosion in the space and time often required for such processing. While previous work has focused on improving the efficiency of the mining algorithms, we consider how the representation, organization, and access of the data may significantly affect performance, especially when I/O costs are also considered. By a simple analysis and comparison of the counting stage for the Apriori association rules algorithm, we show that a "column-wise" approach to data access is often more efficient than the standard row-wise approach. We also provide the results of empirical simulations to validate our analysis. The key idea in our approach is that counting in the Apriori algorithm with data accessed in a column-wise manner significantly reduces the number of disk accesses required to identify itemsets with a minimum support in the database primarily by reducing the degree to which data and counters need to be repeatedly brought into memory.
引用
收藏
页码:522 / 529
页数:8
相关论文
共 10 条
[1]  
Agarwal R., 1994, P 20 INT C VER LARG, V487, P499
[2]  
Agrawal R., 1996, Advances in Knowledge Discovery and Data Mining, P307
[3]  
AGRAWAL R, 1993, P 1993 ACM SIGMOD IN
[4]  
AGRAWAL R, 1996, QUEST DATA MINING SY
[5]  
BAYARDO RJ, 1998, P 1998 ACM SIGMOD C
[6]  
CRESTANA V, 1997, COMMUNICATION DEC
[7]  
DUNKEL B, 1997, J FUTURE GENERAT OCT
[8]  
DUNKEL B, 1999, DATA ORG ACCESS EFFI
[9]  
Silberschatz Abraham., 1997, Database System Concepts, V3rd
[10]  
[No title captured]