Discriminative Analysis of Distortion Sequences in Speech Recognition

被引:5
作者
Chang, Pao-Chung [1 ]
Chen, Sin-Horng [2 ]
Juang, Biing-Hwang [3 ]
机构
[1] Minist Chung Li Commun, Telecommun Labs, Chungli, Taiwan
[2] Natl Chiao Tung Univ, Dept Commun Engn, Hsin Tsu, Taiwan
[3] AT&T Bell Labs, Murray Hill, NJ 07974 USA
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 1993年 / 1卷 / 03期
关键词
11;
D O I
10.1109/89.232616
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In a traditional speech recognition system, the distance score between a test token and a reference pattern is obtained by simply averaging the distortion sequence resulted from matching of the two patterns through a dynamic programming procedure. The final decision is made by choosing the one with the minimal average distance score. If we view the distortion sequence as a form of observed features, a decision rule based on a specific discriminant function designed for the distortion sequence obviously will perform better than that based on the simple average distortion. We, therefore, suggest in this paper a linear discriminant function of the form Delta = Sigma(T)(i=1) omega(i) * d(i) to compute the distance score A instead of a direct average Delta = 1/T Sigma(T)(i=1) d(i). Several adaptive algorithms are proposed to learn the discriminant weighting function in this paper. These include one heuristic method, two methods based on the error propagation algorithm [1], [2], and one method based on the generalized Probabilistic descent (GPD) algorithm [3]. We study these methods in a speaker-independent speech recognition task involving utterances of the highly confusible English E-set (b, c, d, e, g, p, t, v, z). The results show that the best performance is obtained by using the GPD method which achieved a 78.1% accuracy, compared to 67.6% with the traditional unweighted average method. Besides the experimental comparisons, an analytical discussion of various training algorithms is also provided.
引用
收藏
页码:326 / 333
页数:8
相关论文
共 11 条
[1]  
AMARI S, 1967, IEEE T ELECT COMPUTE, P299
[2]  
Duda R. O., 1973, PATTERN CLASSIFICATI, V3
[3]  
JUANG BH, 1987, IEEE T ACOUST SPEECH, V35, P947, DOI 10.1109/TASSP.1987.1165237
[4]  
KATAGIRI S, THEORY ADAP IN PRESS
[5]  
LIPPMANN RP, 1987, IEEE ASSP MAGAZINE, V4, P22
[6]   A 2-PASS PATTERN-RECOGNITION APPROACH TO ISOLATED WORD RECOGNITION [J].
RABINER, LR ;
WILPON, JG .
BELL SYSTEM TECHNICAL JOURNAL, 1981, 60 (05) :739-766
[7]  
Rumelhart DE, 1986, ENCY DATABASE SYST, P45
[8]   ON THE USE OF INSTANTANEOUS AND TRANSITIONAL SPECTRAL INFORMATION IN SPEAKER RECOGNITION [J].
SOONG, FK ;
ROSENBERG, AE .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1988, 36 (06) :871-879
[9]  
SU KY, 1991, P IEEE ICASSP 91, P541
[10]  
WAIBEL A, READINGS SPEECH RECO, pCH7