Classifier performance as a function of distributional complexity

被引：9

作者：

Attoor, SN

Dougherty, ER ^{[1
]}

机构：

[1] Texas A&M Univ, Dept Elect Engn, College Stn, TX 77843 USA

[2] Univ Texas, MD Anderson Canc Ctr, Dept Pathol, Houston, TX 77030 USA

来源：

PATTERN RECOGNITION | 2004年 / 37卷 / 08期

关键词：

classifier design; classifier dimension; distributional complexity; small samples;

D O I：

10.1016/j.patcog.2003.10.013

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

When choosing a classification rule, it is important to take into account the amount of sample data available. This paper examines the performances of classifiers of differing complexities in relation to the complexity of feature-label distributions in the case of small samples. We define the distributional complexity of a feature-label distribution to be the minimal number of hyperplanes necessary to achieve the Bayes classifier if the Bayes classifier is achievable by a finite number of hyperplanes, and infinity otherwise. Our approach is to choose a model and compare classifier efficiencies for various sample sizes and distributional complexities. Simulation results are obtained by generating data based on the model and the distributional complexities. A linear support vector machine (SVM) is considered, along with several nonlinear classifiers. For the most part, we see that there is little improvement when one uses a complex classifier instead of a linear SVM. For higher levels of distributional complexity, the linear classifier degrades, but so do the more complex classifiers owing to insufficient training data. Hence, if one were to obtain a good result with a more complex classifier, it is most likely that the distributional complexity is low and there is no gain over using a linear classifier. Hence, under the model, it is generally impossible to claim that use of the nonlinear classifier is beneficial. In essence, the sample sizes are too small to take advantage of the added complexity. An exception to this observation is the behavior of the three-nearest-neighbor (3NN) classifier in the case of two variables (but not three) when there is very little overlap between the label distributions and the sample size is not too small. With a sample size of 60, the 3NN classifier performs close to the Bayes classifier, even for high levels of distributional complexity. Consequently, if one uses the 3NN classifier with two variables and obtains a low error, then the distributional complexity might be large and, if such is the case, there is a significant gain over using a linear classifier. (C) 2003 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.

引用

页码：1641 / 1651

页数：11

共 11 条

[1]

Baum E. B., 1988, Journal of Complexity, V4, P193, DOI 10.1016/0885-064X(88)90020-9

[2]

BRAGONETO UM, 2004, BIOINFORMATICS, V20

[3] AN EQUIVALENCE THEOREM FOR L1 CONVERGENCE OF THE KERNEL REGRESSION ESTIMATE [J].

DEVROYE, L ;

KRZYZAK, A .

JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 1989, 23 (01) :71-82

[4]

Devroye L., 1996, A probabilistic theory of pattern recognition

[5] Small sample issues for microarray-based classification [J].

Dougherty, ER .

COMPARATIVE AND FUNCTIONAL GENOMICS, 2001, 2 (01) :28-34

[6]

DOUGHERTY ER, 2002, COMPUTATIONAL STAT A

[7] STRONG UNIVERSAL CONSISTENCY OF NEURAL-NETWORK CLASSIFIERS [J].

FARAGO, A ;

LUGOSI, G .

IEEE TRANSACTIONS ON INFORMATION THEORY, 1993, 39 (04) :1146-1151

[8] ASYMPTOTICALLY EFFICIENT SOLUTIONS TO CLASSIFICATION PROBLEM [J].

GORDON, L ;

OLSHEN, RA .

ANNALS OF STATISTICS, 1978, 6 (03) :515-533

[9]

Roobaert D, 2000, NEURAL NETWORKS FOR SIGNAL PROCESSING X, VOLS 1 AND 2, PROCEEDINGS, P356, DOI 10.1109/NNSP.2000.889427

[10] CONSISTENT NONPARAMETRIC REGRESSION [J].

STONE, CJ ;

BICKEL, PJ ;

BREIMAN, L ;

BRILLINGER, DR ;

BRUNK, HD ;

PIERCE, DA ;

CHERNOFF, H ;

COVER, TM ;

COX, DR ;

EDDY, WF ;

HAMPEL, F ;

OLSHEN, RA ;

PARZEN, E ;

ROSENBLATT, M ;

SACKS, J ;

WAHBA, G .

ANNALS OF STATISTICS, 1977, 5 (04) :595-645

← 1 2 →