MI-File: using inverted files for scalable approximate similarity search

被引:35
作者
Amato, Giuseppe [1 ]
Gennaro, Claudio [1 ]
Savino, Pasquale [1 ]
机构
[1] ISTI CNR, I-56124 Pisa, Italy
关键词
Similarity searching; Access methods; Multimedia information retrieval; NEAREST-NEIGHBOR; PROXIMITY; RETRIEVAL;
D O I
10.1007/s11042-012-1271-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a new efficient and accurate technique for generic approximate similarity searching, based on the use of inverted files. We represent each object of a dataset by the ordering of a number of reference objects according to their distance from the object itself. In order to compare two objects in the dataset, we compare the two corresponding orderings of the reference objects. We show that this representation enables us to use inverted files to obtain very efficiently a very small set of good candidates for the query result. The candidate set is then reordered using the original similarity function to obtain the approximate similarity search result. The proposed technique performs several orders of magnitude better than exact similarity searches, still guaranteeing high accuracy. To also demonstrate the scalability of the proposed approach, tests were executed with various dataset sizes, ranging from 200,000 to 100 million objects.
引用
收藏
页码:1333 / 1362
页数:30
相关论文
共 36 条
[1]   Region proximity in metric spaces and its use for approximate similarity search [J].
Amato, G ;
Rabitti, F ;
Savino, P ;
Zezula, P .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2003, 21 (02) :192-227
[2]   Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions [J].
Andoni, Alexandr ;
Indyk, Piotr .
COMMUNICATIONS OF THE ACM, 2008, 51 (01) :117-122
[3]  
[Anonymous], 2008, P 3 INT C SCALABLE I
[4]  
[Anonymous], 1954, THESIS MIT
[5]  
[Anonymous], 2009, P 2 WORKSH VER LARG
[6]  
Bawa M, 2005, WWW 05, DOI [DOI 10.1145/1060745.1060840, 10.1145/1060745.1060840]
[7]  
Beyer K, 1999, LECT NOTES COMPUT SC, V1540, P217
[8]  
Bozkaya T., 1997, SIGMOD Record, V26, P357, DOI 10.1145/253262.253345
[9]  
Brin S., 1995, VLDB '95. Proceedings of the 21st International Conference on Very Large Data Bases, P574
[10]   Effective proximity retrieval by ordering permutations [J].
Chavez, Edgar ;
Figueroa, Karina ;
Navarro, Gonzalo .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2008, 30 (09) :1647-1658