Compressing the graph structure of the web

被引:53
作者
Suel, T [1 ]
Yuan, J [1 ]
机构
[1] Polytech Univ, CIS Dept, Brooklyn, NY 11201 USA
来源
DCC 2001: DATA COMPRESSION CONFERENCE, PROCEEDINGS | 2001年
关键词
D O I
10.1109/DCC.2001.917152
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A large amount of research has recently focused on the graph structure (or link structure) of the World Wide Web. This structure has proven to be extremely useful for improving the performance of search engines and other tools for navigating the web. However, since the graphs in these scenarios involve hundreds of millions of nodes and even more edges, highly space-efficient data structures are needed to fit the data in memory. A first step in this direction was done by the DEC Connectivity Server, which stores the graph in compressed form. In this paper, we describe techniques for compressing the graph structure of the web, and give experimental results of a prototype implementation. We attempt to exploit a variety of different sources of compressibility of these graphs and of the associated set of URLs in order to obtain good compression performance on a large web graph.
引用
收藏
页码:213 / 222
页数:10
相关论文
共 20 条
[1]  
ADLER M, 2001, IN PRESS P IEEE DAT
[2]  
[Anonymous], 1998, P 7 WORLD WID WEB C
[3]  
[Anonymous], 1998, P 1998 ACM SIGMOD IN
[4]  
BHARAT K, 1998, P 21 INT C RES DEV I
[5]  
BHARAT K, 1998, 7 INT WORLD WID WEB
[6]  
BRODER A, 1998, P 30 IEEE S FDN COMP
[7]   Mining the web's link structure [J].
Chakrabarti, S ;
Dom, BE ;
Kumar, SR ;
Raghavan, P ;
Rajagopalan, S ;
Tomkins, A ;
Gibson, D ;
Kleinberg, J .
COMPUTER, 1999, 32 (08) :60-+
[8]  
Chakrabarti S, 1999, PROCEEDINGS OF THE TWENTY-FIFTH INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, P375
[9]  
CHAKRABARTI S, 1999, P 8 INT WORLD WID WE
[10]  
CHEN S, 1996, IEEE DAT COMPR C DCC