Graph Compression by BFS

被引:65
作者
Apostolico, Alberto [1 ,2 ]
Drovandi, Guido [3 ,4 ]
机构
[1] Georgia Inst Technol, Coll Comp, 801 Atlantic Dr, Atlanta, GA 30332 USA
[2] Univ Padua, Dipartimento Ingn Informaz, I-35131 Padua, Italy
[3] Univ Rome Tre, Dipartimento Informat & Automaz, I-00146 Rome, Italy
[4] IASI, CNR, I-00185 Rome, Italy
关键词
data compression; web graph; graph compression; breadth first search; universal codes;
D O I
10.3390/a2031031
中图分类号
TP18 [人工智能理论];
学科分类号
081104 [模式识别与智能系统]; 0812 [计算机科学与技术]; 0835 [软件工程]; 1405 [智能科学与技术];
摘要
The Web Graph is a large-scale graph that does not fit in main memory, so that lossless compression methods have been proposed for it. This paper introduces a compression scheme that combines efficient storage with fast retrieval for the information in a node. The scheme exploits the properties of the Web Graph without assuming an ordering of the URLs, so that it may be applied to more general graphs. Tests on some datasets of use achieve space savings of about 10% over existing methods.
引用
收藏
页码:1031 / 1044
页数:14
相关论文
共 19 条
[1]
Towards compressing Web graphs [J].
Adler, M ;
Mitzenmacher, M .
DCC 2001: DATA COMPRESSION CONFERENCE, PROCEEDINGS, 2001, :203-212
[2]
Asano Y, 2008, LECT NOTES COMPUT SC, V5092, P1
[3]
UbiCrawler: a scalable fully distributed Web crawler [J].
Boldi, P ;
Codenotti, B ;
Santini, M ;
Vigna, S .
SOFTWARE-PRACTICE & EXPERIENCE, 2004, 34 (08) :711-726
[4]
Codes for the World Wide Web [J].
Boldi, Paolo ;
Vigna, Sebastiano .
INTERNET MATHEMATICS, 2005, 2 (04) :407-429
[5]
Boldi Paolo, 2004, P 13 INT C ONWORLD W, P595
[6]
The anatomy of a large-scale hypertextual Web search engine [J].
Brin, S ;
Page, L .
COMPUTER NETWORKS AND ISDN SYSTEMS, 1998, 30 (1-7) :107-117
[7]
Buehrer G, 2008, P 2008 INT C WEB SEA, P95, DOI DOI 10.1145/1341531.1341547
[8]
Chierichetti F, 2009, KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P219
[9]
Claude F., 2007, P 14 INT S STRING PR, P105
[10]
EFFICIENT STORAGE AND RETRIEVAL BY CONTENT AND ADDRESS OF STATIC FILES [J].
ELIAS, P .
JOURNAL OF THE ACM, 1974, 21 (02) :246-260