Searching in metric spaces

被引:678
作者
Chávez, E
Navarro, G
BaezaYates, R
Marroquín, JL
机构
[1] Univ Michoacana, Escuela Ciencias Fisicomatemat, Morelia 58000, Michoacan, Mexico
[2] Univ Chile, Dept Ciencias Computac, Santiago, Chile
[3] CIMAT, Guanajuato 36000, Mexico
关键词
algorithms; curse of dimensionality; nearest neighbors; similarity searching; vector spaces;
D O I
10.1145/502807.502808
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The problem of searching the elements of a set that are close to a given query element under some similarity criterion has a vast number of applications in many branches of computer science, from pattern recognition to textual and multimedia information retrieval. We are interested in the rather general case where the similarity criterion defines a metric space, instead of the more restricted case of a vector space. Many solutions have been proposed in different areas, in many cases without cross-knowledge. Because of this, the same ideas have been reconceived several times, and very different presentations have been given for the same approaches. We present some basic results that explain the intrinsic difficulty of the search problem. This includes a quantitative definition of the elusive concept of "intrinsic dimensionality." We also present a unified view of all the known proposals to organize metric spaces, so as to be able to understand them under a common framework. Most approaches turn out to be variations on a few different concepts. We organize those works in a taxonomy that allows us to devise new algorithms from combinations of concepts not noticed before because of the lack of communication between different communities. We present experiments validating our results and comparing the existing approaches. We finish with recommendations for practitioners and open questions fur future development.
引用
收藏
页码:273 / 321
页数:49
相关论文
共 78 条
[11]   MULTIDIMENSIONAL BINARY SEARCH TREES USED FOR ASSOCIATIVE SEARCHING [J].
BENTLEY, JL .
COMMUNICATIONS OF THE ACM, 1975, 18 (09) :509-517
[12]   MULTIDIMENSIONAL BINARY SEARCH TREES IN DATABASE APPLICATIONS [J].
BENTLEY, JL .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1979, 5 (04) :333-340
[13]  
Berchtold S, 1996, PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, P28
[14]   Learning feature relevance and similarity metrics in image databases [J].
Bhanu, B ;
Peng, J ;
Qing, S .
IEEE WORKSHOP ON CONTENT-BASED ACCESS OF IMAGE AND VIDEO LIBRARIES - PROCEEDINGS, 1998, :14-18
[15]  
BIMBO AD, 1998, P IEEE WORKSH CONT B, P35
[16]  
BOHM C, 2002, IN PRESS ACM COMPUT
[17]  
BOZKAYA T, 1997, P ACM SIGMOD INT C M, V26, P357
[18]   Connectivity of the mutual k-nearest-neighbor graph in clustering and outlier detection [J].
Brito, MR ;
Chavez, EL ;
Quiroz, AJ ;
Yukich, JE .
STATISTICS & PROBABILITY LETTERS, 1997, 35 (01) :33-42
[19]  
BURKHARD WA, 1973, COMMUN ACM, V16, P230, DOI 10.1145/362003.362025
[20]   Fixed queries array:: A fast and economical data structure for proximity searching [J].
Chávez, E ;
Marroquín, JL ;
Navarro, G .
MULTIMEDIA TOOLS AND APPLICATIONS, 2001, 14 (02) :113-135