Composite distance transformation for indexing and k-nearest-neighbor searching in high-dimensional spaces

被引:3
作者
Zhuang, Yi [1 ]
Zhuang, Yue-Ting [1 ]
Wu, Fei [1 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310027, Peoples R China
关键词
centroid distance; k-nearest-neighbor search; start distance;
D O I
10.1007/s11390-007-9027-5
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Due to the famous dimensionality curse problem, search in a high-dimensional space is considered as a "hard" problem. In this paper, a novel composite distance transformation method, which is called CDT, is proposed to support a fast k-nearest-neighbor (k-NN) search in high-dimensional spaces. In CDT, all (n) data points are first grouped into some clusters by a k-Means clustering algorithm. Then a composite distance key of each data point is computed. Finally, these index keys of such n data points are inserted by a partition-based B+-tree. Thus, given a query point, its k-NN search in high-dimensional spaces is transformed into the search in the single dimensional space with the aid of CDT index. Extensive performance studies are conducted to evaluate the effectiveness and efficiency of the proposed scheme. Our results show that this method outperforms the state-of-the-art high-dimensional search techniques, such as the X-Tree, VA-file, iDistance and NB-Tree.
引用
收藏
页码:208 / 217
页数:10
相关论文
共 9 条
  • [1] BECKMANN N, 1990, SIGMOD REC, V19, P322, DOI 10.1145/93605.98741
  • [2] Berchtold S, 1996, PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, P28
  • [3] Berchtold S., 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073), P577, DOI 10.1109/ICDE.2000.839456
  • [4] Searching in high-dimensional spaces -: Index structures for improving the performance of multimedia Databases
    Böhm, C
    Berchtold, S
    Keim, D
    [J]. ACM COMPUTING SURVEYS, 2001, 33 (03) : 322 - 373
  • [5] Indexing high-dimensional data for content-based retrieval in large databases
    Fonseca, MJ
    Jorge, JA
    [J]. EIGHTH INTERNATIONAL CONFERENCE ON DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS, 2003, : 267 - 274
  • [6] Guttman A., 1984, P ACM SIGMOD INT C M, P47, DOI DOI 10.1145/602259.602266
  • [7] iDistance:: An adaptive B+-tree based indexing method for nearest neighbor search
    Jagadish, HV
    Ooi, BC
    Tan, KL
    Yu, C
    Zhang, R
    [J]. ACM TRANSACTIONS ON DATABASE SYSTEMS, 2005, 30 (02): : 364 - 397
  • [8] Katamaya N, 1997, PROC ACM SIGMOD, P32
  • [9] Weber R., 1998, Proceedings of the Twenty-Fourth International Conference on Very-Large Databases, P194