Extracting accurate and complete results from search engines: Case study windows live

被引:46
作者
Thelwall, Mike [1 ]
机构
[1] Wolverhampton Univ, Sch Comp & Informat Sci, Wolverhampton WV1 1SB, England
来源
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY | 2008年 / 59卷 / 01期
关键词
D O I
10.1002/asi.20704
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 [计算机科学与技术];
摘要
Although designed for general Web searching, Webometrics and related research commercial search engines are also used to produce estimated hit counts or lists of URLs matching a query. Unfortunately, however, they do not return all matching URLs for a search and their hit count estimates are unreliable. In this article, we assess whether it is possible to obtain complete lists of matching URLs from Windows Live, and whether any of its hit count estimates are robust. As part of this, we introduce two new methods to extract extra URLs from search engines: automated query splitting and automated domain and TLD searching. Both methods successfully identify additional matching URLs but the findings suggest that there is no way to get complete lists of matching URLs or accurate hit counts from Windows Live, although some estimating suggestions are provided.
引用
收藏
页码:38 / 50
页数:13
相关论文
共 32 条
[1]
AGUILLO IF, 2006, J AM SOC INFORM SCI, V57, P1269
[2]
Informetric analyses on the World Wide Web: Methodological approaches to 'webometrics' [J].
Almind, TC ;
Ingwersen, P .
JOURNAL OF DOCUMENTATION, 1997, 53 (04) :404-426
[3]
ARASU A, 2001, ACM T INTERNET TECHN, V1, P2, DOI DOI 10.1145/383034.383035.D0I:10.1145/383034.383035
[4]
The use of Web search engines in information science research [J].
Bar-Ilan, J .
ANNUAL REVIEW OF INFORMATION SCIENCE AND TECHNOLOGY, 2004, 38 :231-288
[5]
Evolution, continuity, and disappearance of documents on a specific topic on the web: A longitudinal study of "informetrics" [J].
Bar-Ilan, J ;
Peritz, BC .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2004, 55 (11) :980-990
[6]
How much information do search engines disclose on the links to a web page? A longitudinal case study of the 'cybermetrics' home page [J].
Bar-Ilan, J .
JOURNAL OF INFORMATION SCIENCE, 2002, 28 (06) :455-466
[7]
Data collection methods on the Web for informetric purposes - A review and analysis [J].
Bar-Ilan, J .
SCIENTOMETRICS, 2001, 50 (01) :7-32
[8]
BARILAN J, SEARCH ENGINE RESULT
[9]
'Mini small worlds' of shortest link paths crossing domain boundaries in an academic Web space [J].
Bjorneborn, Lennart .
SCIENTOMETRICS, 2006, 68 (03) :395-414
[10]
The anatomy of a large-scale hypertextual Web search engine [J].
Brin, S ;
Page, L .
COMPUTER NETWORKS AND ISDN SYSTEMS, 1998, 30 (1-7) :107-117