Exploiting latent semantic information in statistical language modeling

被引:208
作者
Bellegarda, JR [1 ]
机构
[1] Apple Comp Inc, Spoken Language Grp, Cupertino, CA 95014 USA
关键词
latent semantic analysis; multispan integration; n-grams; speech recognition; statistical language modeling;
D O I
10.1109/5.880084
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Statistical language models used in large-vocabulary speech recognition must properly encapsulate the various constraints, both focal and global, present in the language. While local constraints are readily captured through n-gram modeling, global constraints, such as long-term semantic dependencies, have been more difficult to handle within a data-driven formalism. This paper focuses on the use of latent semantic analysis, a paradigm that automatically uncovers the salient semantic relationships between words and documents in a given corpus. In this approach, (discrete) words and documents are mapped onto a (continuous) semantic vector space, in which familiar clustering techniques ran be applied. This leads to the specification of a powerful framework for automatic semantic classification, as well as the derivation of several language model families with various smoothing properties. Because of their large-span nature, these language models are well suited to complement conventional n-grams. An integrative formulation is proposed for harnessing this synergy, in which the latent semantic information is used to adjust the standard n-gram probability. Such hybrid language modeling compares favorably with the corresponding n-gram baseline: experiments conducted on the Wall Street Journal domain show a reduction in average word error rate of over 20%. This paper concludes with a discussion of intrinsic tradeoffs, such as the influence of training data selection on the resulting performance.
引用
收藏
页码:1279 / 1296
页数:18
相关论文
共 73 条
[61]   Two decades of statistical language modeling: Where do we go from here? [J].
Rosenfeld, R .
PROCEEDINGS OF THE IEEE, 2000, 88 (08) :1270-1278
[62]   A maximum entropy approach to adaptive statistical language modelling [J].
Rosenfeld, R .
COMPUTER SPEECH AND LANGUAGE, 1996, 10 (03) :187-228
[63]  
ROSENFELD R, 1999, P AUT SPEECH REC UND, P231
[64]  
ROSENFELD R, 2000, WORKSH 2000 SPOK LAN
[65]  
ROSENFELD R, 1994, P ARPA SPEECH NAT LA
[66]  
ROUKOS S, 1997, SURVEY STATE ART HUM, pCH6
[67]  
SCHWARTZ R, 1997, P 5 EUR C SPEECH COM, V3, P1455
[68]   An explanation of the effectiveness of latent semantic indexing by means of a Bayesian regression model [J].
Story, RE .
INFORMATION PROCESSING & MANAGEMENT, 1996, 32 (03) :329-344
[69]  
WU J, 1999, P 6 EUR C SPEECH COM, V5, P2179
[70]   RECOGNITION AND PARSING OF CONTEXT-FREE LANGUAGES IN TIME N3 [J].
YOUNGER, DH .
INFORMATION AND CONTROL, 1967, 10 (02) :189-&