ON THE ESTIMATION OF SMALL PROBABILITIES BY LEAVING-ONE-OUT

被引:46
作者
NEY, H [1 ]
ESSEN, U [1 ]
KNESER, R [1 ]
机构
[1] PHILIPS GMBH,FORSCHUNGSLAB AACHEN,D-52066 AACHEN,GERMANY
关键词
STOCHASTIC LANGUAGE MODELING; LEAVING-ONE-OUT; ZERO-FREQUENCY PROBLEM; MAXIMUM LIKELIHOOD ESTIMATION; GENERALIZATION CAPABILITY;
D O I
10.1109/34.476512
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we apply the leaving-one-out concept to the estimation of 'small' probabilities, i.e., the case where the number of training samples is much smaller than the number of possible classes. After deriving the Turing-Good formula in this framework, we introduce several specific models in order to avoid the problems of the original Turing-Good formula These models are the constrained model, the absolute discounting model and the linear discounting model. These models are then applied to the problem of bigram-based stochastic language modeling. Experimental results are presented for a German and an English corpus.
引用
收藏
页码:1202 / 1212
页数:11
相关论文
共 16 条
[1]   A TREE-BASED STATISTICAL LANGUAGE MODEL FOR NATURAL-LANGUAGE SPEECH RECOGNITION [J].
BAHL, LR ;
BROWN, PF ;
DESOUZA, PV ;
MERCER, RL .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1989, 37 (07) :1001-1008
[2]   A MAXIMUM-LIKELIHOOD APPROACH TO CONTINUOUS SPEECH RECOGNITION [J].
BAHL, LR ;
JELINEK, F ;
MERCER, RL .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1983, 5 (02) :179-190
[3]  
Bell T.C., 1990, TEXT COMPRESSION
[4]  
Church K. W., 1991, Computer Speech and Language, V5, P19, DOI 10.1016/0885-2308(91)90016-J
[5]  
Efron B, 1982, JACKKNIFE BOOTSTRAP, DOI 10.1137/1.9781611970319
[6]   THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS [J].
GOOD, IJ .
BIOMETRIKA, 1953, 40 (3-4) :237-264
[7]  
Hart PE, 1973, PATTERN CLASSIFICATI, P271
[8]  
JELINEK F, 1985, IMPACT PROCESSING TE
[9]   WORD-FREQUENCY AND TEXT TYPE - SOME OBSERVATIONS BASED ON THE LOB CORPUS OF BRITISH ENGLISH-TEXTS [J].
JOHANSSON, S .
COMPUTERS AND THE HUMANITIES, 1985, 19 (01) :23-36
[10]   ESTIMATION OF PROBABILITIES FROM SPARSE DATA FOR THE LANGUAGE MODEL COMPONENT OF A SPEECH RECOGNIZER [J].
KATZ, SM .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1987, 35 (03) :400-401