SMALL SAMPLE-SIZE EFFECTS IN STATISTICAL PATTERN-RECOGNITION - RECOMMENDATIONS FOR PRACTITIONERS

被引:916
作者
RAUDYS, SJ [1 ]
JAIN, AK [1 ]
机构
[1] MICHIGAN STATE UNIV, DEPT COMP SCI, E LANSING, MI 48824 USA
关键词
CLASSIFICATION ERROR; CLASSIFIER DESIGN; CURSE OF DIMENSIONALITY; FEATURE SELECTION; STATISTICAL PATTERN RECOGNITION; TEST SAMPLES; TRAINING SAMPLES;
D O I
10.1109/34.75512
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
During the last two decades a considerable amount of effort has been devoted to the analysis of the influence of both training and testing sample size on the design and performance of pattern recognition systems. These questions are interesting to practitioners as well as theoreticians, because the small-sample effects can easily contaminate the design and evaluation of a proposed system. For applications with a large number of features and a complex classification rule, the training sample size must be quite large. A large test sample is required to accurately evaluate a classifier with a low error rate. The design of a pattern recognition system consists of several stages: data collection, formation of the pattern classes, feature selection, specification of the classification algorithm, and estimation of the classification error. In this paper, we will discuss the effects of sample size on feature selection and error estimation for several types of classifier. In addition to surveying prior work in this area, our emphasis is on giving practical advice to today's designers and users of statistical pattern recognition systems.
引用
收藏
页码:252 / 264
页数:13
相关论文
共 57 条
[11]  
ENUKOV IS, 1974, MULTIVARIATE STATIST, P394
[12]   CONSIDERATIONS OF SAMPLE AND FEATURE SIZE [J].
FOLEY, DH .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1972, 18 (05) :618-+
[13]   OPTIMIZATION OF K NEAREST-NEIGHBOR DENSITY ESTIMATES [J].
FUKUNAGA, K ;
HOSTETLER, LD .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1973, 19 (03) :320-326
[14]  
FUKUNAGA K, 1986, HDB PATTERN RECOGNIT, P3
[15]  
GEISSER S, 1964, J ROY STAT SOC B, V26, P69
[16]   ADDITIVE ESTIMATORS FOR PROBABILITIES OF CORRECT CLASSIFICATION [J].
GLICK, N .
PATTERN RECOGNITION, 1978, 10 (03) :211-222
[17]  
GOLDSTEIN M., 1978, DISCRETE DISCRIMINAN
[18]  
GRABAUSKAS V, 1983, COMMUNICATION
[19]  
GRISKEVICIUS D, 1979, STATISTICAL PROBLEMS, P95
[20]   RECENT ADVANCES IN ERROR RATE ESTIMATION [J].
HAND, DJ .
PATTERN RECOGNITION LETTERS, 1986, 4 (05) :335-346