ON STRUCTURING PROBABILISTIC DEPENDENCES IN STOCHASTIC LANGUAGE MODELING

被引:211
作者
NEY, H
ESSEN, U
KNESER, R
机构
[1] Philips GmbH Forschungslaboratorien Aachen, D-52021 Aachen
关键词
D O I
10.1006/csla.1994.1001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we study the problem of stochastic language modelling from the viewpoint of introducing suitable structures into the conditional proabability distributions. The task of these distributions is to predict the probability of a new word by looking at M or even all predecessor words. The conventional approach is to limit M to 1 or 2 and to interpolate the resulting bigram and trigram models with a unigram model in a linear fashion. However, there are many other structures that can be used to model the probabilistic dependences between the predecessor word and the word to be predicted. The structures considered in this paper are: nonlinear interpolation as an alternative to linear interpolation; equivalence classes for word histories and single words; cache memory and word associations. For the optimal estimation of nonlinear and linear interpolation parameters, the leaving-one-out method is systematically used. For the determination of word equivalence classes in a bigram model, an automatic clustering procedure has been adapted. To capture long-distance dependences, we consider various models for word-by-word dependences; the cache model may be viewed as a special type of self-association. Experimental results are presented for two text databases, a Germany database and an English database.
引用
收藏
页码:1 / 38
页数:38
相关论文
共 19 条
[1]   A TREE-BASED STATISTICAL LANGUAGE MODEL FOR NATURAL-LANGUAGE SPEECH RECOGNITION [J].
BAHL, LR ;
BROWN, PF ;
DESOUZA, PV ;
MERCER, RL .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1989, 37 (07) :1001-1008
[2]   A MAXIMUM-LIKELIHOOD APPROACH TO CONTINUOUS SPEECH RECOGNITION [J].
BAHL, LR ;
JELINEK, F ;
MERCER, RL .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1983, 5 (02) :179-190
[3]  
BAHL LR, 1984, IBM TECHNICAL DISCLO, V27, P3941
[4]  
Church K. W., 1990, Computational Linguistics, V16, P22
[5]   NATURAL-LANGUAGE MODELING FOR PHONEME-TO-TEXT TRANSCRIPTION [J].
DEROUAULT, AM ;
MERIALDO, B .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1986, 8 (06) :742-749
[6]  
Duda R. O., 1973, PATTERN CLASSIFICATI, V3
[7]  
ESSEN U, 1992, MAR P IEEE INT C AC
[8]  
Gorin A. L., 1991, Computer Speech and Language, V5, P101, DOI 10.1016/0885-2308(91)90020-Q
[9]  
Jelinek F., 1980, Pattern Recognition in Practice. Proceedings of an International Workshop, P381
[10]  
JELINEK F, 1991, READINGS SPEECH RECO