Speech Recognition Using Weighted HMM and Subspace Projection Approaches

被引:26
作者
Su, Keh-Yih [2 ]
Lee, Chin-Hui [1 ]
机构
[1] AT&T Bell Labs, Speech Res Dept, Murray Hill, NJ 07974 USA
[2] Natl Tsing Hua Univ, Dept Elect Engn, Hsinchu 300, Taiwan
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 1994年 / 2卷 / 01期
关键词
D O I
10.1109/89.260336
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, a weighted hidden Markov model (HMM) algorithm and a subspace projection algorithm are proposed to address the discrimination and robustness issues for HMM-based speech recognition. A robust two-stage classifier is also proposed to incorporate these two approaches to further improve the performance. The weighted HMM enhances its discrimination power by first jointly considering the state likelihoods of different word models, then assigning a weight to the likelihood of each state, according to its contribution in discriminating words. The robustness of this model is then improved by increasing the likelihood difference between the top and the second candidates. The subspace projection approach discards unreliable observations on the basis of maximizing the divergence between different word pairs. To improve robustness, the mean of each cluster is then adjusted to obtain maximum separation between different clusters. The performance was evaluated with a highly confusable vocabulary consisting of the nine English E-set words. The test was conducted in a multispeaker (100 talkers), isolated-word mode. The 61.7% word accuracy for the original HMM-based system was improved to 74.9% and 76.6%, respectively, by using the weighted HMM and the subspace projection methods. By incorporating the weighted HMM in the first stage and the subspace projection in the second stage, the two-stage classifier achieved a word accuracy of 79.4%.
引用
收藏
页码:69 / 79
页数:11
相关论文
共 41 条
[1]   A THEORY OF ADAPTIVE PATTERN CLASSIFIERS [J].
AMARI, S .
IEEE TRANSACTIONS ON ELECTRONIC COMPUTERS, 1967, EC16 (03) :299-+
[2]  
[Anonymous], 1974, CLASSIFICATION ESTIM
[3]  
BAHL LR, 1986, P IEEE INT C AC SPEE, P49
[4]  
BAHL LR, 1988, P ICASSP 88 NEW YORK, P493
[5]  
Blahut R.E., 1987, PRINCIPLES PRACTICE
[6]   FRAME-SPECIFIC STATISTICAL FEATURES FOR SPEAKER INDEPENDENT SPEECH RECOGNITION [J].
BOCCHIERI, EL ;
DODDINGTON, GR .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1986, 34 (04) :755-764
[7]  
Breiman L, 2017, CLASSIFICATION REGRE, P368, DOI 10.1201/9781315139470
[8]  
BROWN PF, 1987, THESIS CARNEGIE MELL
[9]  
Chou W., 1992, P IEEE ICASSP 92, P473
[10]  
Devijver PA, 1982, PATTERN RECOGNITION