Indexing high-dimensional data for content-based retrieval in large databases

被引:31
作者
Fonseca, MJ [1 ]
Jorge, JA [1 ]
机构
[1] Univ Tecn Lisboa, IST, ID, INESC,Dept Informat Syst & Comp Sci, P-1000029 Lisbon, Portugal
来源
EIGHTH INTERNATIONAL CONFERENCE ON DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS | 2003年
关键词
D O I
10.1109/DASFAA.2003.1192391
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Many indexing approaches for high-dimensional data points have evolved into very complex and hard to code algorithms. Sometimes this complexity is not matched by increase in performance. Motivated by these ideas, we take a step back and look at simpler approaches to indexing multimedia data. In this paper we propose a simple, (not simplistic) yet efficient indexing structure for high-dimensional data points of variable dimension, using dimension reduction. Our approach maps multidimensional points to a ID line by computing their Euclidean Norm and use a B+-Tree to store data points. We exploit B+-Tree efficient sequential search to develop simple, yet performant methods to implement point, range and nearest-neighbor queries. To evaluate our technique we conducted a set of experiments, using both synthetic and real data. We analyze creation, insertion and query times as a function of data set size and dimension. Results so far show that our simple scheme outperforms current approaches, such as the Pyramid Technique, the A-Tree and the SR-Tree, for many data distributions. Moreover our approach seems to scale better both with growing dimensionality and data set size, while exhibiting low insertion and search times.
引用
收藏
页码:267 / 274
页数:8
相关论文
共 15 条
  • [1] Berchtold S, 1998, P INT C MAN DAT SIGM
  • [2] BERCHTOLD S, 1996, P 22 INT C VER LARG
  • [3] BOHM SC, 2000, P 16 INT C DAT ENG I, P577
  • [4] The hybrid tree: An index structure for high dimensional feature spaces
    Chakrabarti, K
    Mehrotra, S
    [J]. 15TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 1999, : 440 - 447
  • [5] FONSECA MJ, 2003, INDEXING HIGH DIMENS
  • [6] FONSECA MJ, 2003, IN PRESS COMPUTERS G
  • [7] FONSECA MJ, 2000, CALI SOFTWARE LIB CA
  • [8] The LSDh-tree:: An access structure for feature vectors
    Henrich, A
    [J]. 14TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 1998, : 362 - 369
  • [9] Katayama N., 1997, P ACM SIGMOD, P369
  • [10] LIN KI, 1994, VLDB J, V3, P517, DOI [DOI 10.1007/BF01231606], DOI 10.1007/BF01231606]