Interpolation of n-gram and mutual-information based trigger pair language models for Mandarin speech recognition

被引:3
作者
Zhou, GD [1 ]
Lua, KT [1 ]
机构
[1] Natl Univ Singapore, Sch Comp, Dept Comp Sci, Singapore 119260, Singapore
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While n-gram modeling is simple and dominant in speech recognition, it can only capture the short-distance context dependency within an n-word window where currently the largest practical n for natural language is three. However, many of the context dependencies in natural language occur beyond a three-word window. This paper proposes a new language modeling approach to capture the preferred relationships between words over a short or long distance through the concept of MI-Trigger pairs. Different MI-Trigger-based models are constructed in either a distance-dependent or a distance-independent way within a window from 1 to 10 words. This new MI-Trigger-based modeling is also compared and merged with word bigram modeling. It is found that the MI-Trigger-based modeling has better performance than word bigram modeling. It is also found that n-gram and MI-Trigger models have good complementarity and their proper merging can further increase the recognition rate when tested on Mandarin speech recognition. One advantage of MI-Trigger-based modeling is that the number of parameters needed for MI-Trigger modeling is much less than that of word bigram modeling. Another advantage is that the number of trigger pairs in an MI-Trigger model can be kept to a reasonable size without losing too much of its modeling power. (C) 1999 Academic Press.
引用
收藏
页码:125 / 141
页数:17
相关论文
共 20 条
  • [1] [Anonymous], P AAAI WORKSH INT NA
  • [2] BRENT M, 1993, COMPUTATIONAL LINGUI, V19, P263
  • [3] Brown P. F., 1992, Computational Linguistics, V18, P467
  • [4] CALZOLORI N, 1990, P COLING AUG HELS FI, V2, P54
  • [5] Church K. W., 1991, Computer Speech and Language, V5, P19, DOI 10.1016/0885-2308(91)90016-J
  • [6] GALE WA, 1990, P DARPA SPEECH NAT L, P293
  • [7] Harper MP, 1994, P AAAI WORKSH INT NA, P139
  • [8] Hindle D., 1993, Computational Linguistics, V19, P103
  • [9] ESTIMATION OF PROBABILITIES FROM SPARSE DATA FOR THE LANGUAGE MODEL COMPONENT OF A SPEECH RECOGNIZER
    KATZ, SM
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1987, 35 (03): : 400 - 401
  • [10] KOBAYASHI T, 1994, P COLING 5 9 AUG KYO, V6, P865