Batch incremental processing for FP-tree construction using FP-Growth algorithm

被引：39

作者：

Totad, Shashikumar G. ^{[1
]}

Geeta, R. B. ^{[1
]}

Reddy, P. V. G. D. Prasad ^{[2
]}

机构：

[1] GMR Inst Technol, Rajam, Andhra Pradesh, India

[2] Andhra Univ, Visakhapatnam, Andhra Pradesh, India

来源：

KNOWLEDGE AND INFORMATION SYSTEMS | 2012年 / 33卷 / 02期

关键词：

Incremental mining; Batch incremental processing; Sequential incremental processing; Frequent patterns; Data mining; FP-tree; FP-Growth; MINING FREQUENT PATTERNS; DATABASES; ITEMSETS;

D O I：

10.1007/s10115-012-0514-9

中图分类号：

TP18 [人工智能理论];

学科分类号：

140502 [人工智能];

摘要：

In the present scenario of global economy and World Wide Web, large sets of evolving and distributed data can be handled efficiently by incremental data mining. Frequent patterns are very important in knowledge discovery and data mining process, such as mining of association rules, correlations. FP-tree is a very versatile data structure used for mining of frequent patterns in knowledge discovery and data mining process. FP-tree is a compact representation of transaction database that contains frequency information of all relevant frequent patterns (FP) of the database. All of the existing incremental frequent pattern mining algorithms, such as AFPIM, CATS, CanTree, CP-tree, and SPO-tree, perform incremental mining by processing one transaction of the incremental part of database at a time and updating it to the FP-tree of initial (original) database. Here, in this paper, we propose a novel method that takes advantage of FP-tree representation of incremental transaction database for incremental mining. We propose a batch incremental processing algorithm BIT_FPGrowth that restructures and merges two small consecutive duration FP-trees to obtain a FP-tree of the FP-Growth algorithm. Our BIT_FPGrowth uses FP-tree as preprocessed data repository to get transactions (i.e., item-sets), unlike other sequential incremental algorithms that read transactions from database. BIT_FPGrowth algorithm takes less time for constructing FP-tree. Our experimental results show that, as the size of the database increases, increase in runtime of BIT_FPGrowth is much less and is least of all the other algorithms.

引用

页码：475 / 490

页数：16

共 24 条

[1]

Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072

[2]

Agrawal R., P 20 INT C VERY LARG

[3]

[Anonymous], P 1998 ACM SIGMOD IN

[4]

Performance study of distributed Apriori-like frequent itemsets mining [J].

Aouad, Lamine M. ;

Le-Khac, Nhien-An ;

Kechadi, Tahar M. .

KNOWLEDGE AND INFORMATION SYSTEMS, 2010, 23 (01) :55-72

[5]

On closed constrained frequent pattern mining [J].

Bonchi, F ;

Lucchese, C .

FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, :35-42

[6]

A survey on algorithms for mining frequent itemsets over data streams [J].

Cheng, James ;

Ke, Yiping ;

Ng, Wilfred .

KNOWLEDGE AND INFORMATION SYSTEMS, 2008, 16 (01) :1-27

[7]

Cheung D. W., 1997, Database Systems for Advanced Applications '97. Proceedings of the Fifth International Conference, P185, DOI 10.1142/9789812819536_0020

[8]

Maintenance of discovered association rules in large databases: Art incremental updating technique [J].