Keeping up with the changing Web

被引:59
作者
Brewington, BE [1 ]
Cybenko, G [1 ]
机构
[1] Dartmouth Coll, Hanover, NH 03755 USA
基金
美国国家科学基金会;
关键词
Dynamic information sources - Information overload;
D O I
10.1109/2.841784
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Because information depreciates over time, keeping Web pages current presents new design challenges. This article quantifies what "current" means for Web search engines and estimates how often they must reindex the Web to keep current with its changing pages and structure. Most information-from a newspaper story to a temperature sensor measurement to a Web page-is dynamic. When monitoring an information source, when do our previous observations become stale and need refreshing., How can we schedule these refresh operations to satisfy a required level of currency without violating resource constraints-such as bandwidth or computing limitations on how much data can be observed in a given time? The authors investigate the trade-offs involved in monitoring dynamic information sources and discuss the Web in detail, estimating how fast exploring what constitutes a "current" Web index. For a simple class of Web-monitoring systems-seach-engines- they combine their idea of currency with actual measured data to estimate revisit documents change and rates.
引用
收藏
页码:52 / +
页数:8
相关论文
共 8 条
  • [1] BREWINGTON B, 2000, IN PRESS P 9 INT WOR
  • [2] Coffman E. G. Jr., 1998, Journal of Scheduling, V1, P15, DOI 10.1002/(SICI)1099-1425(199806)1:1<15::AID-JOS3>3.0.CO
  • [3] 2-K
  • [4] CYBENKO GB, 1996, P 9 YAL WORKSH AD LE
  • [5] DOUGLIS F, 1997, P US S INT TECHN SYS
  • [6] Accessibility of information on the web
    Lawrence, S
    Giles, CL
    [J]. NATURE, 1999, 400 (6740) : 107 - 109
  • [7] Montgomery D.C., 2010, Applied Statistics and Probability for Engineers, V5th ed.
  • [8] Papoulis A., 1984, Probability, Random Variables and Stochastic Processes, V2nd