Finding Frequent Patterns in a Large Sparse Graph*

被引:22
作者
Michihiro Kuramochi
George Karypis
机构
[1] University of Minnesota,Department of Computer Science & Engineering
来源
Data Mining and Knowledge Discovery | 2005年 / 11卷
关键词
pattern discovery; frequent subgraph; graph mining;
D O I
暂无
中图分类号
学科分类号
摘要
Graph-based modeling has emerged as a powerful abstraction capable of capturing in a single and unified framework many of the relational, spatial, topological, and other characteristics that are present in a variety of datasets and application areas. Computationally efficient algorithms that find patterns corresponding to frequently occurring subgraphs play an important role in developing data mining-driven methodologies for analyzing the graphs resulting from such datasets. This paper presents two algorithms, based on the horizontal and vertical pattern discovery paradigms, that find the connected subgraphs that have a sufficient number of edge-disjoint embeddings in a single large undirected labeled sparse graph. These algorithms use three different methods for determining the number of edge-disjoint embeddings of a subgraph and employ novel algorithms for candidate generation and frequency counting, which allow them to operate on datasets with different characteristics and to quickly prune unpromising subgraphs. Experimental evaluation on real datasets from various domains show that both algorithms achieve good performance, scale well to sparse input graphs with more than 120,000 vertices or 110,000 edges, and significantly outperform previously developed algorithms.
引用
收藏
页码:243 / 271
页数:28
相关论文
共 69 条
[1]  
Berman H.M.(2000)The protein data bank Nucleic Acids Research 28 235-242
[2]  
Westbrook J.(1994)Substructure discovery using minimum description length and background knowledge Journal of Artificial Intelligence Research 1 231-255
[3]  
Feng Z.(2000)Graph-based data mining IEEE Intelligent Systems 15 32-41
[4]  
Gilliland G.(1995)Knowledge discovery from structural data Journal of Intelligent Information Systems 5 229-245
[5]  
Bhat T.N.(1993)Identification of tertiary structure resemblance in proteins using a maximal common subgraph isomorphism algorithm Journal of Molecular Biology 229 707-721
[6]  
Weissig H.(1997)Greed is good: Approximating independent sets in sparse and bounded-degree graphs Algorithmica 18 145-163
[7]  
Cook D.J.(1983)Efficient bounds for the stable set, vertex cover, and set packing problems Discrete Applied Mathematics 6 243-254
[8]  
Holder L.B.(2003, March)Complete mining of frequent patterns from graphs: Mining graph data Machine Learning 50 321-354
[9]  
Cook D.J.(2001a)Discovery and evaluation of graph-based hierarchical conceptual clusters Journal of Machine Learning Research 2 19-43
[10]  
Holder L.B.(2001b)Hierarchical conceptual structural clustering International Journal on Artificial Intelligence Tools 10 107-136