An Iterative MapReduce Based Frequent Subgraph Mining Algorithm

被引:74
作者
Bhuiyan, Mansurul A. [1 ]
Al Hasan, Mohammad [1 ]
机构
[1] Indiana Univ Purdue Univ, Dept Comp Sci, Indianapolis, IN 46202 USA
基金
美国国家科学基金会;
关键词
Frequent sub-graph mining; iterative MapReduce;
D O I
10.1109/TKDE.2014.2345408
中图分类号
TP18 [人工智能理论];
学科分类号
140502 [人工智能];
摘要
Frequent subgraph mining (FSM) is an important task for exploratory data analysis on graph data. Over the years, many algorithms have been proposed to solve this task. These algorithms assume that the data structure of the mining task is small enough to fit in the main memory of a computer. However, as the real-world graph data grows, both in size and quantity, such an assumption does not hold any longer. To overcome this, some graph database-centric methods have been proposed in recent years for solving FSM; however, a distributed solution using MapReduce paradigm has not been explored extensively. Since MapReduce is becoming the de-facto paradigm for computation on massive data, an efficient FSM algorithm on this paradigm is of huge demand. In this work, we propose a frequent subgraph mining algorithm called FSM-H which uses an iterative MapReduce based framework. FSM-H is complete as it returns all the frequent subgraphs for a given user-defined support, and it is efficient as it applies all the optimizations that the latest FSM algorithms adopt. Our experiments with real life and large synthetic datasets validate the effectiveness of FSM-H for mining frequent subgraphs from large graph datasets. The source code of FSM-H is available from www.cs.iupui.edu/similar to alhasan/software/
引用
收藏
页码:608 / 620
页数:13
相关论文
共 43 条
[1]
Afrati FN, 2013, PROC INT CONF DATA, P62, DOI 10.1109/ICDE.2013.6544814
[2]
Agrawal R., P 20 INT C VERY LARG
[3]
[Anonymous], DATA INTENSIVE TEXT
[4]
[Anonymous], 2007, P 2007 ACM SIGMOD IN
[5]
[Anonymous], 2006, ECEASST, P1
[6]
Densest Subgraph in Streaming and MapReduce [J].
Bahmani, Bahman ;
Kumar, Ravi ;
Vassilvitskii, Sergei .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (05) :454-465
[7]
Berendt B, 2006, LECT NOTES ARTIF INT, V4198, P18
[8]
Mining molecular fragments: Finding relevant substructures of molecules [J].
Borgelt, C ;
Berthold, MR .
2002 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2002, :51-58
[9]
Buehrer G, 2006, IEEE DATA MINING, P97
[10]
Chakravarthy S, 2004, LECT NOTES ARTIF INT, V3056, P341