A bit of progress in language modeling

被引:186
作者
Goodman, JT [1 ]
机构
[1] Microsoft Corp, Machine Learning & Appl Stat Grp, Redmond, WA 98052 USA
关键词
D O I
10.1006/csla.2001.0174
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the past several years, a number of different language modeling improvements over simple trigram models have been found, including caching, higher-order n-grams, skipping, interpolated Kneser-Ney smoothing, and clustering. We present explorations of variations on, or of the limits of, each of these techniques, including showing that sentence mixture models may have more potential. While all of these techniques have been studied separately, they have rarely been studied in combination. We compare a combination of all techniques together to a Katz smoothed trigram model with no count cutoffs. We achieve perplexity reductions between 38 and 50% (1 bit of entropy), depending on training data size, as well as a word error rate reduction of 8.9%. Our perplexity reductions are perhaps the highest reported compared to a fair baseline. (C) 2001 Academic Press.
引用
收藏
页码:403 / 434
页数:32
相关论文
共 52 条
[41]  
Press W. H., 1994, NUMERICAL RECIPES C
[42]   Two decades of statistical language modeling: Where do we go from here? [J].
Rosenfeld, R .
PROCEEDINGS OF THE IEEE, 2000, 88 (08) :1270-1278
[43]  
ROSENFELD R, 1994, THESIS CARN MELL U
[44]  
ROSENFELD R, 2001, IN PRESS COMPUTER SP
[45]  
SAUL L, 1997, P 2 C EMP METH NAT L, P81
[46]   Variable n-grams and extensions for conversational speech language modeling [J].
Siu, MH ;
Ostendorf, M .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (01) :63-75
[47]  
STERN RM, 1996, P DARPA SPEECH REC W, P5
[48]  
Tillmann C., 1996, P INT WORKSH SPEECH, P22
[49]  
WENG F, 1997, 1997 DARPA SPEECH RE, P147
[50]  
Yamamoto H., 1999, P IEEE INT C AC SPEE