Discriminative Analysis of Distortion Sequences in Speech Recognition

被引：5

作者：

Chang, Pao-Chung ^{[1
]}

Chen, Sin-Horng ^{[2
]}

Juang, Biing-Hwang ^{[3
]}

机构：

[1] Minist Chung Li Commun, Telecommun Labs, Chungli, Taiwan

[2] Natl Chiao Tung Univ, Dept Commun Engn, Hsin Tsu, Taiwan

[3] AT&T Bell Labs, Murray Hill, NJ 07974 USA

来源：

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 1993年 / 1卷 / 03期

关键词：

11;

D O I：

10.1109/89.232616

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In a traditional speech recognition system, the distance score between a test token and a reference pattern is obtained by simply averaging the distortion sequence resulted from matching of the two patterns through a dynamic programming procedure. The final decision is made by choosing the one with the minimal average distance score. If we view the distortion sequence as a form of observed features, a decision rule based on a specific discriminant function designed for the distortion sequence obviously will perform better than that based on the simple average distortion. We, therefore, suggest in this paper a linear discriminant function of the form Delta = Sigma(T)(i=1) omega(i) * d(i) to compute the distance score A instead of a direct average Delta = 1/T Sigma(T)(i=1) d(i). Several adaptive algorithms are proposed to learn the discriminant weighting function in this paper. These include one heuristic method, two methods based on the error propagation algorithm [1], [2], and one method based on the generalized Probabilistic descent (GPD) algorithm [3]. We study these methods in a speaker-independent speech recognition task involving utterances of the highly confusible English E-set (b, c, d, e, g, p, t, v, z). The results show that the best performance is obtained by using the GPD method which achieved a 78.1% accuracy, compared to 67.6% with the traditional unweighted average method. Besides the experimental comparisons, an analytical discussion of various training algorithms is also provided.

引用

页码：326 / 333

页数：8

共 11 条

[1]

AMARI S, 1967, IEEE T ELECT COMPUTE, P299

[2]

Duda R. O., 1973, PATTERN CLASSIFICATI, V3

[3]

JUANG BH, 1987, IEEE T ACOUST SPEECH, V35, P947, DOI 10.1109/TASSP.1987.1165237

[4]

KATAGIRI S, THEORY ADAP IN PRESS

[5]

LIPPMANN RP, 1987, IEEE ASSP MAGAZINE, V4, P22

[6] A 2-PASS PATTERN-RECOGNITION APPROACH TO ISOLATED WORD RECOGNITION [J].

RABINER, LR ;

WILPON, JG .

BELL SYSTEM TECHNICAL JOURNAL, 1981, 60 (05) :739-766

[7]

Rumelhart DE, 1986, ENCY DATABASE SYST, P45

[8] ON THE USE OF INSTANTANEOUS AND TRANSITIONAL SPECTRAL INFORMATION IN SPEAKER RECOGNITION [J].

SOONG, FK ;

ROSENBERG, AE .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1988, 36 (06) :871-879

[9]

SU KY, 1991, P IEEE ICASSP 91, P541

[10]

WAIBEL A, READINGS SPEECH RECO, pCH7

← 1 2 →