An empirical study of smoothing techniques for language modeling

被引:548
作者
Chen, SF
Goodman, J
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
[2] Microsoft Res, Redmond, WA 98052 USA
基金
美国国家科学基金会;
关键词
D O I
10.1006/csla.1999.0128
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We survey the most widely-used algorithms for smoothing models for language n-gram modeling. We then present an extensive empirical comparison of several of these smoothing techniques, including those described by Jelinek and Mercer (1980); Katz (1987); Bell, Cleary and Witten (1990); Ney, Essen and Kneser (1994), and Kneser and Ney (1995). We investigate how factors such as training data size, training corpus (e.g. Brown vs. Wall Street Journal), count cutoffs, and n-gram order (bigram vs. trigram) affect the relative performance of these methods, which is measured through the cross-entropy of test data. We find that these factors can significantly affect the relative performance of models, with the most significant factor being training data size. Since no previous comparisons have examined these factors systematically, this is the first thorough characterization of the relative performance of various algorithms. In addition, we introduce methodologies for analyzing smoothing algorithm efficacy in detail, and using these techniques we motivate a novel variation of Kneser-Ney smoothing that consistently outperforms all other algorithms evaluated. Finally, results showing that improved language model smoothing leads to improved speech recognition performance are presented. (C) 1999 Academic Press.
引用
收藏
页码:359 / 394
页数:36
相关论文
共 52 条
[1]  
[Anonymous], 1992, AAAI S PROB APPR NAT
[2]  
[Anonymous], 1998, DARPA BROADC NEWS TR
[3]  
[Anonymous], THESIS STANFORD U
[4]   A TREE-BASED STATISTICAL LANGUAGE MODEL FOR NATURAL-LANGUAGE SPEECH RECOGNITION [J].
BAHL, LR ;
BROWN, PF ;
DESOUZA, PV ;
MERCER, RL .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1989, 37 (07) :1001-1008
[5]   A MAXIMUM-LIKELIHOOD APPROACH TO CONTINUOUS SPEECH RECOGNITION [J].
BAHL, LR ;
JELINEK, F ;
MERCER, RL .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1983, 5 (02) :179-190
[6]  
Baum L.E., 1972, Inequalities III: Proceedings of the Third Symposium on Inequalities, page, V3, P1
[7]  
Bell T. C., 1990, TEXT COMPRESSION
[8]  
Brown P. F., 1992, Computational Linguistics, V18, P467
[9]  
Brown P. F., 1990, Computational Linguistics, V16, P79
[10]  
BROWN PF, 1992, AM J COMPUTATIONAL L, V18, P31