Effective proximity retrieval by ordering permutations

被引:115
作者
Chavez, Edgar [1 ]
Figueroa, Karina [1 ]
Navarro, Gonzalo [2 ]
机构
[1] Univ Michoacana, Fac Ciencias Fis & Matemat, Morelia 58000, Michoacan, Mexico
[2] Univ Chile, Dept Comp Sci, Santiago, Chile
关键词
similarity searching; metric spaces; indexing methods; information search and retrieval; pattern recognition;
D O I
10.1109/TPAMI.2007.70815
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce a new probabilistic proximity search algorithm for range and K-nearest neighbor (K-NN) searching in both coordinate and metric spaces. Although there exist solutions for these problems, they boil down to a linear scan when the space is intrinsically high dimensional, as is the case in many pattern recognition tasks. This, for example, renders the K-NN approach to classification rather slow in large databases. Our novel idea is to predict closeness between elements according to how they order their distances toward a distinguished set of anchor objects. Each element in the space sorts the anchor objects from closest to farthest to it and the similarity between orders turns out to be an excellent predictor of the closeness between the corresponding elements. We present extensive experiments comparing our method against state-of-the-art exact and approximate techniques, both in synthetic and real, metric and nonmetric databases, measuring both CPU time and distance computations. The experiments demonstrate that our technique almost always improves upon the performance of alternative techniques, in some cases by a wide margin.
引用
收藏
页码:1647 / 1658
页数:12
相关论文
共 44 条
[1]  
AGGARWAL CC, 2001, P SIGMOD PODS, V1, P13
[2]  
AGGARWAL CC, 2001, P 8 INT C DAT THEOR, P420
[3]  
[Anonymous], P 21 INT C VER LARG
[4]  
Arslan AN, 2000, J DISCRETE ALGORITHM, V1, P3
[5]  
ARYA S, 1994, PROCEEDINGS OF THE FIFTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P573
[6]  
Baeza-Yates R., 1994, Combinatorial Pattern Matching. 5th Annual Symposium, CPM 94. Proceedings, P198
[7]  
Baeza-Yates R.A., 1999, Modern Information Retrieval
[8]   MULTIDIMENSIONAL BINARY SEARCH TREES USED FOR ASSOCIATIVE SEARCHING [J].
BENTLEY, JL .
COMMUNICATIONS OF THE ACM, 1975, 18 (09) :509-517
[9]  
BURKHARD WA, 1973, COMMUN ACM, V16, P230, DOI 10.1145/362003.362025
[10]  
BUSTOS B, 2003, J DISCRETE ALGORITHM, V2, P115