Accounting for boundary effects in nearest-neighbor searching

被引:13
作者
Arya, S
Mount, DM
Narayan, O
机构
[1] UNIV MARYLAND,DEPT COMP SCI,COLLEGE PK,MD 20742
[2] UNIV MARYLAND,INST ADV COMP STUDIES,COLLEGE PK,MD 20742
[3] UNIV CALIF SANTA CRUZ,DEPT PHYS,SANTA CRUZ,CA 95064
关键词
D O I
10.1007/BF02716805
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Given n data points in d-dimensional space, nearest-neighbor searching involves determining the nearest of these data points to a given query point. Most average-case analyses of nearest-neighbor searching algorithms are made under the simplifying assumption that d is fixed and that n is so large relative to d that boundary effects can be ignored. This means that for any query point the statistical distribution of the data points surrounding it is independent of the location of the query point. However, in many applications of nearest-neighbor searching (such as data compression by vector quantization) this assumption is not met, since the number of data points n grows roughly as 2(d). Largely for this reason, the actual performances of many nearest-neighbor algorithms tend to be much better than their theoretical analyses would suggest. We present evidence of why this is the case. We provide an accurate analysis of the number of cells visited in nearest-neighbor searching by the bucketing and k-d tree algorithms. We assume m(d) points uniformly distributed in dimension d, where m is a fixed integer greater than or equal to 2. Further, we assume that distances are measured in the L(infinity) metric. Our analysis is tight in the limit as d approaches infinity. Empirical evidence is presented showing that the analysis applies even in low dimensions.
引用
收藏
页码:155 / 176
页数:22
相关论文
共 15 条
  • [1] ARFKEN G, 1985, MATH METHODS PHYSICI
  • [2] ARRYA S, 1993, P DCC 9O DAT COMPR C, P381
  • [3] ARYA S, 1993, PROCEEDINGS OF THE FOURTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P271
  • [4] ARYA S, 1994, PROCEEDINGS OF THE FIFTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P573
  • [5] OPTIMAL EXPECTED-TIME ALGORITHMS FOR CLOSEST POINT PROBLEMS
    BENTLEY, JL
    WEIDE, BW
    YAO, AC
    [J]. ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 1980, 6 (04): : 563 - 580
  • [6] MULTIDIMENSIONAL BINARY SEARCH TREES USED FOR ASSOCIATIVE SEARCHING
    BENTLEY, JL
    [J]. COMMUNICATIONS OF THE ACM, 1975, 18 (09) : 509 - 517
  • [7] Clarkson K. L., 1994, Proceedings of the Tenth Annual Symposium on Computational Geometry, P160, DOI 10.1145/177424.177609
  • [8] Cleary J. G., 1979, ACM Transactions on Mathematical Software, V5, P183, DOI 10.1145/355826.355832
  • [9] Feller W., 1968, INTRO PROBABILITY TH
  • [10] Friedman J. H., 1977, ACM Transactions on Mathematical Software, V3, P209, DOI 10.1145/355744.355745