ON THE EXPONENTIAL VALUE OF LABELED SAMPLES

被引：113

作者：

CASTELLI, V ^{[1
]}

COVER, TM ^{[1
]}

机构：

[1] STANFORD UNIV,DEPT ELECT ENGN,INFORMAT SYST LAB,STANFORD,CA 94305

来源：

PATTERN RECOGNITION LETTERS | 1995年 / 16卷 / 01期

基金：

美国国家科学基金会;

关键词：

D O I：

10.1016/0167-8655(94)00074-D

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Consider the problem of classifying a sample X(0) into one of two classes, using a training set Q. Let Q be composed of l labeled samples {(X(1), theta(1)), ..., (X(l), theta(l))} and u unlabeled samples {X'(1), ..., X'(u)}, where the labels theta(i) are i.i.d. Bernoulli(eta) random variables over the set {1, 2}, the observations {X(i)}(l)(i=1) are distributed according to f(theta i)(.) and the unlabeled observations {X'(j)}(u)(j=l) are independently distributed according to the mixture density f(X')(.) = nf(1)(.)+(1-eta)f(.). We assume that f(1)(.), f(2)(.) and eta are all unknown. Let f(1)(.) and f(2)(.) belong to a known family F, and assume that the mixtures of elements of F are identifiable. Even when the number of unlabeled samples is infinite and the decision regions can therefore be identified, one still needs labeled samples to label the decision regions with the correct classification. Letting R(l, u) denote the optimal probability of error for l labeled and u unlabeled samples, and assuming that the pairwise mixtures of F are identifiable, we obtain the obvious statements R(0, u)=R(0, infinity) = 1 /2, R(1, 0) less than or equal to 2 eta eta, R(infinity, u) =R*, and then prove R(1, infinity) = 2R*(1-R*), where R* is the Bayes probability of error, and R(l, infinity) = R*+exp{-alpha l+o(l)}, where the exponent alpha is given by -log(2 root eta eta integral root f(1)(x)f(2)(x)dx). Thus the first labeled sample reduces the risk from 1/2 to 2R*(1-R*) and subsequent labeled samples in the training set reduce the probability of error exponentially fast to the Bayes risk.

引用

页码：105 / 111

页数：7

共 18 条

[1] [Anonymous], 1993, LARGE DEVIATIONS TEC
[2] [Anonymous], 2006, ELEM INF THEORY
[3] AUTOMATIC PATTERN-RECOGNITION - A STUDY OF THE PROBABILITY OF ERROR
DEVROYE, L
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1988, 10 (04) : 530 - 543
[4] Jain A. K., 1982, HDB STAT, V2, P835, DOI DOI 10.1016/S0169-7161(82)02042-2
[5] KESTER ADM, 1985, SOME LARGE DEVIATION
[6] LEHMANN EL, 1991, TESTING STATISTICAL
[7] McLachlan G. J., 1988, MIXTURE MODELS
[8] ESTIMATING LINEAR DISCRIMINANT FUNCTION FROM INITIAL SAMPLES CONTAINING A SMALL NUMBER OF UNCLASSIFIED OBSERVATIONS
MCLACHLAN, GJ
[J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1977, 72 (358) : 403 - 406
[9] UPDATING A DISCRIMINANT FUNCTION ON THE BASIS OF UNCLASSIFIED DATA
MCLACHLAN, GJ
GANESALINGAM, S
[J]. COMMUNICATIONS IN STATISTICS PART B-SIMULATION AND COMPUTATION, 1982, 11 (06): : 753 - 767
[10] NORMAL DISCRIMINATION WITH UNCLASSIFIED OBSERVATIONS
ONEILL, TJ
[J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1978, 73 (364) : 821 - 826

← 1 2 →