Efficient semantic search on DHT overlays

被引:23
作者
Zhu, Yingwu [1 ]
Hu, Yiming
机构
[1] Seattle Univ, Dept CSSE, Seattle, WA 98122 USA
[2] Univ Cincinnati, Dept ECECS, Cincinnati, OH 45221 USA
关键词
peer-to-peer; vector space model; locality sensitive hashing; semantic indexing; semantic locating; top term; recall;
D O I
10.1016/j.jpdc.2007.01.005
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Distributed hash tables (DHTs) excel at exact-match lookups, but they do not directly support complex queries such as semantic search that is based on content. In this paper, we propose a novel approach to efficient semantic search on DHT overlays. The basic idea is to place indexes of semantically close files into same peer nodes with high probability by exploiting information retrieval algorithms and locality sensitive hashing. A query for retrieving semantically close files is answered with high recall by consulting only a small number (e.g., 10-20) of nodes that stores the indexes of the files semantically close to the query. Our approach adds only index information to peer nodes, imposing only a small storage overhead. Via detailed simulations, we show that our approach achieves high recall for queries at very low cost, i.e., the number of nodes visited for a query is about 10-20, independent of the overlay size. (C) 2007 Elsevier Inc. All rights reserved.
引用
收藏
页码:604 / 616
页数:13
相关论文
共 31 条
[1]  
[Anonymous], 2003, SIGIR 03 P 26 ANN IN, DOI DOI 10.1145/860435.860491
[2]  
Broder A. Z., 1998, Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, P327, DOI 10.1145/276698.276781
[3]  
BUCKLEY C, 1985, TR85686 CORN U DEP C
[4]  
Charikar M.S., 2002, Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, P380, DOI DOI 10.1145/509907.509965
[5]  
Cohen E, 2003, IEEE INFOCOM SER, P1261
[6]   Routing indices for peer-to-peer systems [J].
Crespo, A ;
Garcia-Molina, H .
22ND INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, PROCEEDINGS, 2002, :23-32
[7]  
GUPTA A, 2003, P 1 BIENN C INN DAT
[8]  
HARREN M, 2002, P 1 INT WORKSH PEER
[9]  
Indyk P., 1998, Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, P604, DOI 10.1145/276698.276876
[10]   Data clustering: A review [J].
Jain, AK ;
Murty, MN ;
Flynn, PJ .
ACM COMPUTING SURVEYS, 1999, 31 (03) :264-323