Effective top-k computation with term-proximity support

被引:3
作者
Zhu, Mingjie [1 ]
Shi, Shuming [2 ]
Li, Mingjing [1 ]
Wen, Ji-Rong [2 ]
机构
[1] Univ Sci & Technol China, Hefei 230026, Peoples R China
[2] Microsoft Res Asia, Beijing 100080, Peoples R China
关键词
Top-k; Dynamic index pruning; Term-proximity; Document structure; Proximity-Probe; Non-linear ranking function; Approximate top-k; TEXT;
D O I
10.1016/j.ipm.2009.04.002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Modern web search engines are expected to return the top-k results efficiently. Although many dynamic index pruning strategies have been proposed for efficient top-k computation, most of them are prone to ignoring some especially important factors in ranking functions, such as term-proximity (the distance relationship between query terms in a document). In our recent work [Zhu, M., Shi, S., Li, M., & Wen.]. (2007). Effective top-k computation in retrieving structured documents with term-proximity support. In Proceedings of 16th CIKM conference (pp. 771-780)], we demonstrated that, when term-proximity is incorporated into ranking functions, most existing index structures and top-k strategies become quite inefficient. To solve this problem, we built the inverted index based on web page structure and proposed the query processing strategies accordingly. The experimental results indicate that the proposed index structures and query processing strategies significantly improve the top-k efficiency. In this paper, we study the possibility of adopting additional techniques to further improve top-k computation efficiency. We propose a Proximity-Probe Heuristic to make our top-k algorithms more efficient. We also test the efficiency of our approaches on various settings (linear or non-linear ranking functions, exact or approximate top-k processing, etc.). (C) 2009 Elsevier Ltd. All rights reserved.
引用
收藏
页码:401 / 412
页数:12
相关论文
共 29 条
[1]  
Anh V. N., 2006, Proceedings of the Twenty-Ninth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P372, DOI 10.1145/1148170.1148235
[2]  
Anh V. N, 2006, P 15 ACM INT C INF K, P190, DOI [10.1145/1183614.1183645, DOI 10.1145/1183614.1183645]
[3]  
[Anonymous], P SIGIR
[4]  
BLANCO R, 2008, THESIS
[5]   The anatomy of a large-scale hypertextual Web search engine [J].
Brin, S ;
Page, L .
COMPUTER NETWORKS AND ISDN SYSTEMS, 1998, 30 (1-7) :107-117
[6]  
Broder Andrei Z, 2003, P 12 INT C INF KNOWL, P426
[7]  
BUCKLEY C., 1985, SIGIR 85, P97
[8]  
BURROWS M, 1999, Patent No. 5864863
[9]  
Buttcher S., 2006, P 2006 ACM CIKM INT, P182, DOI [DOI 10.1145/1183614.1183644, 10.1145/1183614.1183644]
[10]  
Carmel D., 2001, SIGIR Forum, P43, DOI 10.1145/383952.383958