GlOSS:: Text-source discovery over the Internet

被引:111
作者
Gravano, L
García-Molina, H
Tomasic, A
机构
[1] Columbia Univ, Dept Comp Sci, New York, NY 10027 USA
[2] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
[3] INRIA Rocquencourt, Le Chesnay, France
来源
ACM TRANSACTIONS ON DATABASE SYSTEMS | 1999年 / 24卷 / 02期
关键词
performance; measurement; Internet search and retrieval; digital libraries; text databases; distributed information retrieval;
D O I
10.1145/320248.320252
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The dramatic growth of the Internet has created a new problem for users: location of the relevant sources of documents. This article presents a framework for (and experimentally analyzes a solution to) this problem, which we call the text-source discovery problem. Our approach consists of two phases. First, each text source exports its contents to a centralized service. Second, users present queries to the service, which returns an ordered list of promising text sources. This article describes GlOSS, Glossary of Servers Server, with two versions: bGlOSS, which provides a Boolean query retrieval model, and vGlOSS, which provides a vector-space retrieval model. We also present hGlOSS, which provides a decentralized version of the system. We extensively describe the methodology for measuring the retrieval effectiveness of these systems and provide experimental evidence, based on actual data, that all three systems are highly effective in determining promising text sources for a given query.
引用
收藏
页码:229 / 264
页数:36
相关论文
共 40 条
[1]  
[Anonymous], P 18 INT ACM SIGIR C
[2]  
BARBARA D, 1992, MITLTR3192
[3]  
BOWMAN C, 1994, CUCS73294
[4]  
CHAMIS AY, 1988, J AM SOC INFORM SCI, V39, P217, DOI 10.1002/(SICI)1097-4571(198805)39:3<217::AID-ASI5>3.0.CO
[5]  
2-C
[6]   Boolean query mapping across heterogeneous information sources [J].
Chang, KCC ;
GarciaMolina, H ;
Paepcke, A .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1996, 8 (04) :515-521
[7]  
Danzig P. B., 1992, Computing Systems, V5, P433
[8]  
DANZIG PB, 1991, P 14 ANN INT ACM SIG, P220
[9]  
DOLIN R, 1996, TRCS9605 U CAL SANT
[10]  
DUDA A, 1994, P 14 IEEE INT C DIST