Exploiting latent semantic information in statistical language modeling

被引：208

作者：

Bellegarda, JR ^{[1
]}

机构：

[1] Apple Comp Inc, Spoken Language Grp, Cupertino, CA 95014 USA

来源：

PROCEEDINGS OF THE IEEE | 2000年 / 88卷 / 08期

关键词：

latent semantic analysis; multispan integration; n-grams; speech recognition; statistical language modeling;

D O I：

10.1109/5.880084

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Statistical language models used in large-vocabulary speech recognition must properly encapsulate the various constraints, both focal and global, present in the language. While local constraints are readily captured through n-gram modeling, global constraints, such as long-term semantic dependencies, have been more difficult to handle within a data-driven formalism. This paper focuses on the use of latent semantic analysis, a paradigm that automatically uncovers the salient semantic relationships between words and documents in a given corpus. In this approach, (discrete) words and documents are mapped onto a (continuous) semantic vector space, in which familiar clustering techniques ran be applied. This leads to the specification of a powerful framework for automatic semantic classification, as well as the derivation of several language model families with various smoothing properties. Because of their large-span nature, these language models are well suited to complement conventional n-grams. An integrative formulation is proposed for harnessing this synergy, in which the latent semantic information is used to adjust the standard n-gram probability. Such hybrid language modeling compares favorably with the corresponding n-gram baseline: experiments conducted on the Wall Street Journal domain show a reduction in average word error rate of over 20%. This paper concludes with a discussion of intrinsic tradeoffs, such as the influence of training data selection on the resulting performance.

引用

页码：1279 / 1296

页数：18

共 73 条

[61] Two decades of statistical language modeling: Where do we go from here? [J].

Rosenfeld, R .

PROCEEDINGS OF THE IEEE, 2000, 88 (08) :1270-1278

[62] A maximum entropy approach to adaptive statistical language modelling [J].

Rosenfeld, R .

COMPUTER SPEECH AND LANGUAGE, 1996, 10 (03) :187-228

[63]

ROSENFELD R, 1999, P AUT SPEECH REC UND, P231

[64]

ROSENFELD R, 2000, WORKSH 2000 SPOK LAN

[65]

ROSENFELD R, 1994, P ARPA SPEECH NAT LA

[66]

ROUKOS S, 1997, SURVEY STATE ART HUM, pCH6

[67]

SCHWARTZ R, 1997, P 5 EUR C SPEECH COM, V3, P1455

[68] An explanation of the effectiveness of latent semantic indexing by means of a Bayesian regression model [J].

Story, RE .

INFORMATION PROCESSING & MANAGEMENT, 1996, 32 (03) :329-344

[69]

WU J, 1999, P 6 EUR C SPEECH COM, V5, P2179

[70] RECOGNITION AND PARSING OF CONTEXT-FREE LANGUAGES IN TIME N3 [J].

YOUNGER, DH .

INFORMATION AND CONTROL, 1967, 10 (02) :189-&

← 1 2 3 4 5 6 7 8 →