Architectural design and evaluation of an efficient Web-crawling system

被引:15
作者
Yan, HF [1 ]
Wang, JY [1 ]
Li, XM [1 ]
Guo, L [1 ]
机构
[1] Peking Univ, Dept Comp Sci & Technol, Comp Networks & Distributed Syst Lab, Beijing 100871, Peoples R China
关键词
world wide web; web-crawling; scalability; reconfigurability; search engine;
D O I
10.1016/S0164-1212(01)00091-7
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
This paper presents an architectural design and evaluation result of an efficient Web-crawling system. The design involves a fully distributed architecture, a URL allocating algorithm, and a method to assure system scalability and dynamic reconfigurability. Simulation experiment shows that load balance, scalability and efficiency can be achieved in the system. Currently this distributed Web-crawling subsystem has been successfully integrated with WebGather, a well-known Chinese and English Web search engine, aimed at collecting all the Web pages in China and keeping pace with the rapid growth of Chinese Web information. In addition, we believe that the design can also be useful in other context such as digital library, etc. (C) 2002 Elsevier Science Inc. All rights reserved.
引用
收藏
页码:185 / 193
页数:9
相关论文
共 8 条
[1]  
BOWMAN C, 1995, HARVEST INFORMATION
[2]  
Brin S., 1998, 7 INT WORLD WID WEB
[3]  
*CERNIC, 2000, INF SERV
[4]  
*CNNIC, 2000, CHIN NETW DEV STAT R
[5]  
LIU J, 2000, P 4 INT C HIGH PERF, P751
[6]  
*NEC RES I, 2000, INKT NEC RES I
[7]  
SULLIVAN D, 2000, SEARCH ENGINE SIZES
[8]  
2000, GOOGLE SEARCH ENGINE