A cost model for query processing in high dimensional data spaces

被引:65
作者
Böhm, C [1 ]
机构
[1] Univ Munich, Inst Comp Sci, D-80538 Munich, Germany
来源
ACM TRANSACTIONS ON DATABASE SYSTEMS | 2000年 / 25卷 / 02期
关键词
performance; theory; cost model; multidimensional index;
D O I
10.1145/357775.357776
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
During the last decade, multimedia databases have become increasingly important in many application areas such as medicine, CAD, geography, and molecular biology. An important research topic in multimedia databases is similarity search in large data sets. Most current approaches that address similarity search use the feature approach, which transforms important properties of the stored objects into points of a high-dimensional space (feature vectors). Thus, similarity search is transformed into a neighborhood search in feature space. Multidimensional index structures are usually applied when managing feature vectors. Query processing can be improved substantially with optimization techniques such as blocksize optimization, data space quantization, and dimension reduction. To determine optimal parameters, an accurate estimate of index-based query processing performance is crucial. In this paper we develop a cost model for index structures for point databases such as the R *-tree and the X-tree. It provides accurate estimates of the number of data page accesses for range queries and nearest-neighbor queries under a Euclidean metric and a maximum metric. The problems specific to high-dimensional data spaces, called boundary effects, are considered. The concept of the fractal dimension is used to take the effects of correlated data into account.
引用
收藏
页码:129 / 178
页数:50
相关论文
共 69 条
  • [1] Agrawal R., 1995, VLDB '95. Proceedings of the 21st International Conference on Very Large Data Bases, P490
  • [2] Agrawal R., 1993, P 4 INT C FDN DAT OR, V730, P69
  • [3] ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
  • [4] [Anonymous], 1994, P 2 ACMWORKSHOP ADV
  • [5] [Anonymous], P ACM SIGMOD INT C M
  • [6] [Anonymous], P ACM SIG MOD INT C
  • [7] AREF WG, 1991, PROC INT CONF VERY L, P81
  • [8] Arya S., 1995, Proceedings of the Eleventh Annual Symposium on Computational Geometry, P336, DOI 10.1145/220279.220315
  • [9] ARYA S, 1995, THESIS U MARYLAND CO
  • [10] BECKMANN N, 1990, SIGMOD REC, V19, P322, DOI 10.1145/93605.98741