CSVD: Clustering and Singular Value Decomposition for approximate similarity search in high-dimensional spaces

被引:45
作者
Castelli, V
Thomasian, A
Li, CS
机构
[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA
[2] New Jersey Inst Technol, Dept Comp Sci, Newark, NJ 07102 USA
关键词
multidimensional indexing; singular value decomposition; clustering; multimedia indexing; curse of dimensionality; principal component analysis;
D O I
10.1109/TKDE.2003.1198398
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nearest-neighbor search of high-dimensionality spaces is critical for,many applications, such as content-based retrieval from multimedia databases, similarity search of patterns in data mining, and nearest-heighbor classification. Unfortunately, even with the aid of the commonly used indexing schemes, the performance of nearest-neighbor (NN) queries deteriorates rapidly with the number of dimensions. We propose a method, called Clustering with Singular Value Decomposition (CSVD), which supports efficient approximate processing of NN queries, while maintaining good precision-recall characteristics. CSVD groups homogeneous points into clusters and separately reduces the dimensionality of each cluster using SVD. Cluster selection for NN queries relies on a branch-and-bound algorithm and within-cluster searches can be performed with traditional or in-memory indexing methods. Experiments with texture vectors extracted from satellite images show that CSVD achieves significantly higher dimensionality reduction than plain SVD for the same Normalized Mean Squared Error (NMSE), which translates into a higher efficiency in processing approximate NN queries.
引用
收藏
页码:671 / 685
页数:15
相关论文
共 29 条
[1]  
AGGARWAL CC, 2000, P ACM SIGMOD INT C M, P70, DOI DOI 10.1145/335191
[2]  
[Anonymous], 1986, PRINCIPAL COMPONENT
[3]  
BEATTY M, 1997, P IEEE INT C IM PROC, V2, P835
[4]  
Berchtold S., 1997, Proceedings of the Sixteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS 1997, P78, DOI 10.1145/263661.263671
[5]  
BERCHTOLD S, 1997, P ACM SIGMOD INT C M, P1
[6]  
Bergman L. D., 2000, International Journal on Digital Libraries, V3, P85
[7]  
Beyer Kevin., 1999, INT C DATABASE THEOR, P217, DOI DOI 10.1007/3-540-49257-7_15
[8]  
BLOTT S, 1997, SIMPLE VECTOR APPROX
[9]  
Cappelli R., 1999, Proceedings. Tenth International Workshop on Database and Expert Systems Applications. DEXA 99, P155, DOI 10.1109/DEXA.1999.795159
[10]  
CASTELLI V, 2000, Patent No. 6122628