Rare itemset mining

被引:44
作者
Adda, Mehdi [1 ]
Wu, Lei [2 ]
Feng, Yi [3 ]
机构
[1] Univ Montreal, Dept Comp Sci & Operat Res, Montreal, PQ, Canada
[2] Rochester Inst Technol, Dept Software Engn, Rochester, NY USA
[3] Algoma Univ, Dept Comp Sci, Marietta, GA USA
来源
ICMLA 2007: SIXTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS | 2007年
关键词
D O I
10.1109/ICMLA.2007.106
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A pattern is a collection of events/features that occur together in a transaction database. Previous studies in the field are often dedicated to the problem of frequent pattern mining where only patterns that appear frequently in the input data are mined. As a result, patterns involving events/features that appear in few data sets are not captured. In some domains, such as the detection of computer attacks, fraudulent transactions in financial institutions, those patterns, also known as rare patterns, are more interesting than frequent patterns. We propose a framework to represent different categories of interesting patterns and then instantiate it to the specific case of rare patterns. Later on, we present a generic framework to mine patterns based on the Apriori approach. In this paper we are interested by the patterns composed of a set of items, also called itemsets. Thus, we instantiate the generalized Apriori framework to mine rare itemsets. The resulting approach is Apriori-like and the mine idea behind it is that if the itemset lattice representing the itemset space in classical Apriori approaches is traversed on a bottom-up manner, equivalent properties to the Apriori exploration of frequent itemsets are provided to mine rare itemsets. This include an anti-monotone property and a level-wise exploration of the itemset space. As demonstrated by our experiments, our approach is effective in identifying all rare itemsets and is more efficient than the existing approach.
引用
收藏
页码:73 / +
页数:2
相关论文
共 12 条
[1]  
Agrawal R., P 20 INT C VER LARG, P487
[2]  
BAGWELL P, IDEAL HASH TREES
[3]  
GOETTHALS B, 2003, SURVEY FREQUENT PATT
[4]  
HAN J, 2000, P 2000 ACM SIGMOD IN, P1, DOI DOI 10.1145/342009.335372
[5]  
Lee W, 1998, PROCEEDINGS OF THE SEVENTH USENIX SECURITY SYMPOSIUM, P79
[6]  
Lin DI, 1998, LECT NOTES COMPUT SC, V1377, P105
[7]   A framework for dynamic evidence based medicine using data mining [J].
Masuda, C ;
Sakamoto, N ;
Yamamoto, R .
PROCEEDINGS OF THE 15TH IEEE SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS, 2002, :117-122
[8]  
Pasquier N, 1999, LECT NOTES COMPUT SC, V1540, P398
[9]  
PRATI RC, 2004, SADIO ELECT J INFORM, V6, P53
[10]   A transaction mapping algorithm for frequent itemsets mining [J].
Song, MJ ;
Rajasekaran, S .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2006, 18 (04) :472-481