Building efficient and effective metasearch engines

被引:160
作者
Meng, WY [1 ]
Yu, C
Liu, KL
机构
[1] SUNY Binghamton, Dept Comp Sci, Binghamton, NY 13902 USA
[2] Univ Illinois, Dept Comp Sci, Chicago, IL 60607 USA
[3] Depaul Univ, Sch Comp Sci Telecommun & Informat Syst, Chicago, IL 60604 USA
关键词
design; experimentation; performance; collection fusion; distributed collection; distributed information retrieval; information resource discovery; metasearch;
D O I
10.1145/505282.505284
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Frequently a user's information needs are stored in the databases of multiple search engines. It is inconvenient and inefficient for an ordinary user to invoke multiple search engines and identify useful documents from the returned results. To support unified access to multiple search engines, a metasearch engine can be constructed. When a metasearch engine receives a query from a user, it invokes the underlying search engines to retrieve useful information for the user. Metasearch engines have other benefits as a search tool such as increasing the search coverage of the Web and improving the scalability of the search. In this article, we survey techniques that have been proposed to tackle several underlying challenges for building a good metasearch engine. Among the main challenges, the database selection problem is to identify search engines that are likely to return useful documents to a given query. The document selection problem is to determine what documents to retrieve from each identified search engine. The result merging problem is to combine the documents returned from multiple search engines. We will also point out some problems that need to be further researched.
引用
收藏
页码:48 / 89
页数:42
相关论文
共 83 条
[61]  
Salton G., 1988, Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer
[62]   The MetaCrawler architecture for resource aggregation on the Web [J].
Selberg, E ;
Etzioni, O .
IEEE EXPERT-INTELLIGENT SYSTEMS & THEIR APPLICATIONS, 1997, 12 (01) :11-14
[63]  
SELBERG E, 1995, P 4 INT WORLD WID WE, P195
[64]  
SHELDON M, 1994, P 4 INT C EXT DAT TE, P109
[65]  
Singh AN, 1996, J PSYCHIATR NEUROSCI, V21, P29
[66]  
SUGIURA A, 2000, P 9 WORLD WID WEB C, P417
[67]  
TOWELL G, 1995, P 12 INT C MACH LEAR, P540
[68]   EVALUATION OF AN INFERENCE NETWORK-BASED RETRIEVAL MODEL [J].
TURTLE, H ;
CROFT, WB .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 1991, 9 (03) :187-222
[69]   Fusion via a linear combination of scores [J].
Vogt C.C. ;
Cottrell G.W. .
Information Retrieval, 1999, 1 (3) :151-173
[70]  
VOORHEES E, 1996, P 4 TEXT RETR C GAIT, P121