Summarizing and Understanding Large Graphs

被引:65
作者
Koutra, Danai [1 ]
Kang, U. [2 ]
Vreeken, Jilles [3 ,4 ]
Faloutsos, Christos [1 ]
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
[2] Korea Adv Inst Sci & Technol, Dept Comp Sci, Taejon 305701, South Korea
[3] Max Planck Inst Informat, Databases & Informat Syst, D-66123 Saarbrucken, Germany
[4] Univ Saarland, D-66123 Saarbrucken, Germany
关键词
graph summarization; minimum description length; graph visualization; COMPRESSION;
D O I
10.1002/sam.11267
中图分类号
TP18 [人工智能理论];
学科分类号
140502 [人工智能];
摘要
How can we succinctly describe a million-node graph with a few simple sentences? Given a large graph, how can we find its most "important" structures, so that we can summarize it and easily visualize it? How can we measure the "importance" of a set of discovered subgraphs in a large graph? Starting with the observation that real graphs often consist of stars, bipartite cores, cliques, and chains, our main idea is to find the most succinct description of a graph in these "vocabulary" terms. To this end, we first mine candidate subgraphs using one or more graph partitioning algorithms. Next, we identify the optimal summarization using the minimum description length (MDL) principle, picking only those subgraphs from the candidates that together yield the best lossless compression of the graph-or, equivalently, that most succinctly describe its adjacency matrix. Our contributions are threefold: (i) formulation: we provide a principled encoding scheme to identify the vocabulary type of a given subgraph for six structure types prevalent in real-world graphs, (ii) algorithm: we develop VoG, an efficient method to approximate the MDL-optimal summary of a given graph in terms of local graph structures, and (iii) applicability: we report an extensive empirical evaluation on multimillion-edge real graphs, including Flickr and the Notre Dame web graph. (C) 2015 Wiley Periodicals, Inc. Statistical Analysis and Data Mining: The ASA Data Science Journal 8: 183-202, 2015
引用
收藏
页码:183 / 202
页数:20
相关论文
共 58 条
[1]
Akoglu L., 2012, P ACM INT C MAN DAT
[2]
Akoglu L., 2012, P 21 ACM C INF KNOWL
[3]
Akoglu L., 2013, P SIAM INT C DAT MIN
[4]
[Anonymous], P SIAM INT C DAT MIN
[5]
[Anonymous], SDM WORKSH LINK AN C
[6]
[Anonymous], 2008, P 2008 SIAM INT C DA, DOI DOI 10.1137/1.9781611972788.10
[7]
[Anonymous], P 17 ACM INT C KNOWL
[8]
[Anonymous], P 14 PAC AS C KNOWL
[9]
Graph Compression by BFS [J].
Apostolico, Alberto ;
Drovandi, Guido .
ALGORITHMS, 2009, 2 (03) :1031-1044
[10]
Araujo Miguel, 2014, Machine Learning and Knowledge Discovery in Databases. European Conference, ECML PKDD 2014. Proceedings: LNCS 8724, P50, DOI 10.1007/978-3-662-44848-9_4