A bit of progress in language modeling

被引:186
作者
Goodman, JT [1 ]
机构
[1] Microsoft Corp, Machine Learning & Appl Stat Grp, Redmond, WA 98052 USA
关键词
D O I
10.1006/csla.2001.0174
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the past several years, a number of different language modeling improvements over simple trigram models have been found, including caching, higher-order n-grams, skipping, interpolated Kneser-Ney smoothing, and clustering. We present explorations of variations on, or of the limits of, each of these techniques, including showing that sentence mixture models may have more potential. While all of these techniques have been studied separately, they have rarely been studied in combination. We compare a combination of all techniques together to a Katz smoothed trigram model with no count cutoffs. We achieve perplexity reductions between 38 and 50% (1 bit of entropy), depending on training data size, as well as a word error rate reduction of 8.9%. Our perplexity reductions are perhaps the highest reported compared to a fair baseline. (C) 2001 Academic Press.
引用
收藏
页码:403 / 434
页数:32
相关论文
共 52 条
[1]  
[Anonymous], 1991, Proceedings of the DARPA Workshop on Speech Natural Language, February 1991, DOI DOI 10.3115/112405.112464
[2]  
[Anonymous], 1992, AAAI S PROB APPR NAT
[3]  
[Anonymous], EUROSPEECH 95
[4]   Exploiting latent semantic information in statistical language modeling [J].
Bellegarda, JR .
PROCEEDINGS OF THE IEEE, 2000, 88 (08) :1279-1296
[5]  
Bellegarda JR, 1996, INT CONF ACOUST SPEE, P172, DOI 10.1109/ICASSP.1996.540318
[6]  
BENGIO Y, 2000, 1178 U MONTR DEP INF
[7]   Combination of words and word categories in varigram histories [J].
Blasig, R .
ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, :529-532
[8]  
Brown P. F., 1992, Computational Linguistics, V18, P467
[9]  
Brown P. F., 1990, Computational Linguistics, V16, P79
[10]  
CAI C, 2000, P NIST DARPA SPEECH