Efficient mining of association rules in text databases

被引:15
作者
Holt, JD [1 ]
Chung, SM [1 ]
机构
[1] Wright State Univ, Dept Comp Sci & Engn, Dayton, OH 45435 USA
来源
PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON INFORMATION KNOWLEDGE MANAGEMENT, CIKM'99 | 1999年
关键词
D O I
10.1145/319950.319981
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose two new algorithms for mining association rules between words in text databases. The characteristics of text databases are quite different from those of retail transaction databases, and existing mining algorithms cannot handle text databases efficiently because of the large number of itemsets (i.e., words) that need to be counted. Two well-known mining algorithms, Apriori algorithm and Direct Hashing and Pruning (DHP) algorithm, are evaluated in the context of mining text databases, and are compared with the new proposed algorithms named Multipass-Apriori (M-Apriori) and Multipass-DHP (M-DHP). It has been shown that the proposed algorithms have better performance for large text databases.
引用
收藏
页码:234 / 242
页数:9
相关论文
共 10 条
[1]  
Agrawal R, 1994, P 20 INT C VER LARG, V1215, P487
[2]  
[Anonymous], 1988, AUTOMATIC TEXT PROCE
[3]  
[Anonymous], P INT C VER LARG DAT
[4]  
Brin S., 1997, SIGMOD Record, V26, P255, DOI [10.1145/253262.253327, 10.1145/253262.253325]
[5]   Data mining: An overview from a database perspective [J].
Chen, MS ;
Han, JW ;
Yu, PS .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1996, 8 (06) :866-883
[6]  
Gordon JS, 1998, AM HERITAGE, V49, P8
[7]  
Park JS, 1997, IEEE T KNOWL DATA EN, V9, P813, DOI 10.1109/69.634757
[8]  
Toivonen H, 1996, PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, P134
[9]  
VOORHEES EM, 1997, 5 TEXT RETR C NAT I
[10]  
ZAKI MJ, 1997, 651 U ROCH COMP SCI