Fast k most similar neighbor classifier for mixed data (tree k-MSN)

被引:7
作者
Hernandez-Rodriguez, Selene [1 ]
Fco Martinez-Trinidad, J. [1 ]
Ariel Carrasco-Ochoa, J. [1 ]
机构
[1] Natl Inst Astrophys Opt & Elect, Dept Comp Sci, Puebla 72840, Mexico
关键词
Nearest neighbor rule; Fast k nearest neighbor search; Mixed data; Non-metric comparison functions; SEARCH; ALGORITHM;
D O I
10.1016/j.patcog.2009.08.014
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The k nearest neighbor (k-NN) classifier has been a widely used nonparametric technique in Pattern Recognition, because of its simplicity and good performance. In order to decide the class of a new prototype, the k-NN classifier performs an exhaustive comparison between the prototype to classify and the prototypes in the training set T. However, when T is large, the exhaustive comparison is expensive. For this reason, many fast k-NN classifiers have been developed, some of them are based on a tree structure, which is created during a preprocessing phase using the prototypes in T. Then, in a search phase, the tree is traversed to find the nearest neighbor. The speed up is obtained, while the exploration of some parts of the tree is avoided using pruning rules which are usually based on the triangle inequality. However, in soft sciences as Medicine, Geology, Sociology, etc., the prototypes are usually described by numerical and categorical attributes (mixed data), and sometimes the comparison function for computing the similarity between prototypes does not satisfy metric properties. Therefore, in this work an approximate fast k most similar neighbor classifier, for mixed data and similarity functions that do not satisfy metric properties, based on a tree structure (Tree k-MSN) is proposed. Some experiments with synthetic and real data are presented. (C) 2009 Elsevier Ltd. All rights reserved.
引用
收藏
页码:873 / 886
页数:14
相关论文
共 46 条
[1]  
Adler M, 2008, LECT NOTES COMPUT SC, V4978, P554, DOI 10.1007/978-3-540-79228-4_48
[2]  
[Anonymous], P 21 C VER LARG DAT
[3]   An optimal algorithm for approximate nearest neighbor searching in fixed dimensions [J].
Arya, S ;
Mount, DM ;
Netanyahu, NS ;
Silverman, R ;
Wu, AY .
JOURNAL OF THE ACM, 1998, 45 (06) :891-923
[4]   Indexing the solution space: A new technique for nearest neighbor search in high-dimensional space [J].
Berchtold, S ;
Keim, DA ;
Kriegel, HP ;
Seidl, T .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2000, 12 (01) :45-57
[5]  
BERCHTOLD S, 1988, P ACM SIGMOD INT C M, P142
[6]  
Blake C. L., 1998, Uci repository of machine learning databases
[7]   Relaxational metric adaptation and its application to semi-supervised clustering and content-based image retrieval [J].
Chang, Hong ;
Yeung, Dit-Yan ;
Cheung, William K. .
PATTERN RECOGNITION, 2006, 39 (10) :1905-1917
[8]   All-optical 2R regeneration based on a compact self-seeded Fabry-Perot laser diode with an embedded fiber Bragg grating [J].
Chien, HC ;
Lee, CC ;
Chen, YM ;
Feng, KM ;
Chi, S .
IEEE PHOTONICS TECHNOLOGY LETTERS, 2006, 18 (1-4) :559-561
[9]   NEAREST NEIGHBOR PATTERN CLASSIFICATION [J].
COVER, TM ;
HART, PE .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) :21-+
[10]   PCA-based branch and bound search algorithms for computing K nearest neighbors [J].
D'haes, W ;
van Dyck, D ;
Rodet, X .
PATTERN RECOGNITION LETTERS, 2003, 24 (9-10) :1437-1451