On nearest neighbor classification using adaptive choice of k

被引:28
作者
Ghosh, Anil K. [1 ]
机构
[1] Indian Inst Technol, Dept Math & Stat, Kanpur 208016, Uttar Pradesh, India
关键词
Bayesian strength function; cross-validation; misclassification rate; noninformative prior; optimal bayes risk; posterior probability; p value; robustness;
D O I
10.1198/106186007x208380
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Nearest neighbor classification is one of the simplest and popular methods for statistical pattern recognition. It classifies an observation x to the class, which is the most frequent in the neighborhood of x. The size of this neighborhood is usually determined by a predefined parameter k. Normally, one uses cross-validation techniques to estimate the optimum value of this parameter, and that estimated value is used for classifying all observations. However, in classification problems, in addition to depending on the training sample, a good choice of k depends on the specific observation to be classified. Therefore, instead of using a fixed value of k over the entire measurement space, a spatially adaptive choice of k may be more useful in practice. This article presents one such adaptive nearest neighbor classification technique, where the value of k is selected depending on the distribution of competing classes in the vicinity of the observation to be classified. The utility of the proposed method has been illustrated using some simulated examples and well-known benchmark datasets. Asymptotic optimality of its misclassification rate has been derived under appropriate regularity conditions.
引用
收藏
页码:482 / 502
页数:21
相关论文
共 34 条
[1]  
Anderson T., 2003, INTRO MULTIVARIATE S
[2]  
[Anonymous], 1973, Sequential analysis
[3]  
[Anonymous], 1985, Data: A Collection of Problems from Many Fields for the Student and Research Worker
[4]  
BAILEY T, 1978, IEEE T SYST MAN CYB, V8, P311
[5]   SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation [J].
Blewitt, Marnie E. ;
Gendrel, Anne-Valerie ;
Pang, Zhenyi ;
Sparrow, Duncan B. ;
Whitelaw, Nadia ;
Craig, Jeffrey M. ;
Apedaile, Anwyn ;
Hilton, Douglas J. ;
Dunwoodie, Sally L. ;
Brockdorff, Neil ;
Kay, Graham F. ;
Whitelaw, Emma .
NATURE GENETICS, 2008, 40 (05) :663-669
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]   NEAREST NEIGHBOR PATTERN CLASSIFICATION [J].
COVER, TM ;
HART, PE .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) :21-+
[8]  
Dasarathy B. V., 1991, Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques
[9]  
Dehnad K., 2012, Density estimation for statistics and data analysis, V29, P495, DOI [10.1201/9781315140919, 10.1080/00401706.1987.10488295]
[10]   Locally adaptive metric nearest-neighbor classification [J].
Domeniconi, C ;
Peng, J ;
Gunopulos, D .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (09) :1281-1285