The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter

被引:118
作者
Castelli, V [1 ]
Cover, TM [1 ]
机构
[1] STANFORD UNIV,DEPT ELECT ENGN & STAT,STANFORD,CA 94305
基金
美国国家科学基金会;
关键词
pattern recognition; supervised learning; unsupervised learning; labeled and unlabeled samples; Bayesian method; Laplace's integral; asymptotic theory;
D O I
10.1109/18.556600
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We observe a training set Q composed of l labeled samples {(X(1).theta(1)),...(X(l),theta(l))} and u unlabeled samples {X'(1),...X'(u)}. The labels theta(i) are independent random variables satisfying Pr {theta 2--- = 1} = eta, Pr {theta(i) = 2} = 1 - eta. The labeled observations X(2) are independently distributed with conditional density f(theta i)(.) given theta(2). Let (X(0), theta(0)) be a new sample, independently distributed as the samples in the training set. We observe X(0) and we wish to infer the classification theta(0). In this paper we first assume that the distributions f(1)(.) and f(2)(.) are given and that the mixing parameter eta is unknown, We show that the relative value of labeled and unlabeled samples in reducing the risk of optimal classifiers is the ratio of the Fisher informations they carry about the parameter eta. We then assume that two densities g(1)(.) and g(2)(.) are given, but we do not know whether g(1)(.) = f(1)(.) and g(2)(.) = f(2)(.) or if the opposite holds, nor do we know eta. Thus the learning problem consists of both estimating the optimum partition of the observation space and assigning the classifications to the decision regions, Here, we show that labeled samples are necessary to construct a classification rule and that they are exponentially more valuable than unlabeled samples.
引用
收藏
页码:2102 / 2117
页数:16
相关论文
共 28 条
[1]  
[Anonymous], 1970, ADAPTIVE LEARNING PA
[2]   ON THE EXPONENTIAL VALUE OF LABELED SAMPLES [J].
CASTELLI, V ;
COVER, TM .
PATTERN RECOGNITION LETTERS, 1995, 16 (01) :105-111
[3]  
CASTELLI V, 1994, UNPUB RELATIVE VALUE
[4]  
CASTELLI V, 1995, 77 STANF U DEP STAT
[5]  
CHEN CH, 1973, STATISTICAL PATTERN
[6]  
Dembo A., 1993, Large deviations techniques and applications
[8]  
FUKUNAGA K, 1972, INTRO STATISTICAL PA
[9]  
Hart P.E., 1973, Pattern recognition and scene analysis
[10]  
Holdaway R. M., 1989, IJCNN: International Joint Conference on Neural Networks (Cat. No.89CH2765-6), P523, DOI 10.1109/IJCNN.1989.118293