A three-year study on the freshness of web search engine databases

被引:34
作者
Lewandowski, Dirk [1 ]
机构
[1] Hamburg Univ Appl Sci, Fac Design Media & Informat, Dept Informat, D-20099 Hamburg, Germany
关键词
index freshness; online information retrieval; search engines; world wide web;
D O I
10.1177/0165551508089396
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper deals with one aspect of the index quality of search engines: index freshness. The purpose is to analyse the update strategies of the major web search engines Google, Yahoo, and MSN/Live. com. We conducted a test of the updates of 40 daily updated pages and 30 irregularly updated pages. We used data from a time span of six weeks in the years 2005, 2006 and 2007. We found that the best search engine in terms of up-to-dateness changes over the years and that none of the engines has an ideal solution for index freshness. Indexing patterns are often irregular, and there seems to be no clear policy regarding when to revisit Web pages. A major problem identified in our research is the delay in making crawled pages available for searching, which differs from one engine to another.
引用
收藏
页码:817 / 831
页数:15
相关论文
共 31 条
[1]  
Acharya A., 2005, USA: United States Patent, Patent No. [US 2005/0071741 Al, 20050071741]
[2]  
Adams S, 2003, ONLINE, V27, P16
[3]  
Bar-Ilan J, 2004, WEB DYNAMICS: ADAPTING TO CHANGE IN CONTENT, SIZE TOPOLOG AND USE, P195
[4]  
BRODER AZ, 2006, Patent No. 10995770
[5]   Effective page refresh policies for Web crawlers [J].
Cho, J ;
Garcia-Molina, H .
ACM TRANSACTIONS ON DATABASE SYSTEMS, 2003, 28 (04) :390-426
[6]   Web-crawling reliability [J].
Cothey, V .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2004, 55 (14) :1228-1238
[7]   An investigation of web crawler behavior: characterization and metrics [J].
Dikaiakos, MD ;
Stassopoulou, A ;
Papageorgiou, L .
COMPUTER COMMUNICATIONS, 2005, 28 (08) :880-897
[8]  
Dobra A, 2004, WEB DYNAMICS: ADAPTING TO CHANGE IN CONTENT, SIZE TOPOLOG AND USE, P23
[9]   A large-scale study of the evolution of Web pages [J].
Fetterly, D ;
Manasse, M ;
Najork, M ;
Wiener, JL .
SOFTWARE-PRACTICE & EXPERIENCE, 2004, 34 (02) :213-U3
[10]  
Griesbaum J, 2004, INFORM RES, V9