Combination of words and word categories in varigram histories

被引:2
作者
Blasig, R [1 ]
机构
[1] Philips Res Labs, D-52066 Aachen, Germany
来源
ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI | 1999年
关键词
D O I
10.1109/ICASSP.1999.758179
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a new kind of language models: category/word varigrams, This special model type permits a tight integration of word-based and category-based modeling of word sequences. Any succession of words and word categories may be employed to describe a given word history. This provides a much greater flexibility than previous combinations of word-based and category-based language models. Experiments on the WSJ0 corpus and the 1994 ARPA evaluation data indicate that the category/word varigram yields a perplexity reduction of up to 10 percent as compared to a word varigram of the same size, and improves the word error rate (WER) by 7 percent. Compared to a linear interpolation of a word-based and a category-based n-gram, the WER improvement is about 4 percent.
引用
收藏
页码:529 / 532
页数:4
相关论文
共 9 条
[1]  
[Anonymous], 1993, P EUROSPEECH
[2]  
[Anonymous], P INT C AC SPEECH SI, DOI DOI 10.1109/ICASSP.1995.479394
[3]  
KNESER R, 1996, P INT C SPOK LANG PR, V1, P494
[4]  
MARTIN S, 1995, P EUROSPEECH 95 MADR, P1253
[5]   ON THE ESTIMATION OF SMALL PROBABILITIES BY LEAVING-ONE-OUT [J].
NEY, H ;
ESSEN, U ;
KNESER, R .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1995, 17 (12) :1202-1212
[6]   ON STRUCTURING PROBABILISTIC DEPENDENCES IN STOCHASTIC LANGUAGE MODELING [J].
NEY, H ;
ESSEN, U ;
KNESER, R .
COMPUTER SPEECH AND LANGUAGE, 1994, 8 (01) :1-38
[7]  
NIESLER T, 1996, P INT C SPOKEN LANGU, V1, P220
[8]  
NIESLER TR, 1996, P IEEE INT C AC SPEE, V1, P164
[9]  
SUI M, 1997, P EUROSPEECH RHOD SE, P2739