Segmental modeling using a continuous mixture of nonparametric models

被引:9
作者
Goldberger, J [1 ]
Burshtein, D
Franco, H
机构
[1] Tel Aviv Univ, IL-69978 Tel Aviv, Israel
[2] SRI Int, Menlo Park, CA 94025 USA
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 1999年 / 7卷 / 03期
关键词
hidden Markov models; mixture models; segmental modeling; speech recognition;
D O I
10.1109/89.759032
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A major limitation of hidden Markov model (HMM) based automatic speech recognition is the inherent assumption that successive observations within a state are independent and identically distributed (IID), The IID assumption is reasonable for some of the states (e.g., a state that corresponds to a steady state vowel), However, most states clearly violate this assumption (e.g., states corresponding to vowel-consonant transition, diphthongs, etc.) and are in fact characterized by a highly correlated and nonstationary speech signal. In recent years, alternative models have been proposed, that attempt to describe the dynamics of the signal within a phonetic unit. The new approach is generally known by the name segmental modeling, since the speech signal is modeled on a segment level base and not on a frame base (such as HMM). We propose a family of new segmental models that are composed of two elements, The first element is a nonparametric representation of the mean and variance trajectories, and the second is some parameterized transformation (e.g., random shift),of the trajectory that is global to the entire segment, The new model is in fact a continuous mixture of segment trajectories, We present recognition results on a large vocabulary task, and compare the model to alternative segment models on a triphone recognition task.
引用
收藏
页码:262 / 271
页数:10
相关论文
共 26 条
[1]  
[Anonymous], P INT C AC SPEECH SI
[2]   A MARKOV MODEL CONTAINING STATE-CONDITIONED 2ND-ORDER NON-STATIONARITY - APPLICATION TO SPEECH RECOGNITION [J].
DENG, L ;
RATHINAVELU, C .
COMPUTER SPEECH AND LANGUAGE, 1995, 9 (01) :63-86
[3]   Speech Recognition Using Hidden Markov Models with Polynomial Regression Functions as Nonstationary States [J].
Deng, Li ;
Aksmanovic, Mike ;
Sun, Xiaodong ;
Wu, C. F. Jeff .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (04) :507-520
[4]  
Digalakis V., 1992, THESIS BOSTON U BOST
[5]   Genones: Generalized mixture tying in continuous hidden Markov model-based speech recognizers [J].
Digalakis, VV ;
Monaco, P ;
Murveit, H .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1996, 4 (04) :281-289
[6]  
DODDINGTON G, 1992, P ARPA WORKSH SPOK L
[7]  
GALES M, 1993, 133 CUEDFINFENGTR
[8]  
GALES M, 1995, P EUR, P1579
[9]   Hidden Markov models with templates as non-stationary states: An application to speech recognition [J].
Ghitza, Oded ;
Sondhi, M.Mohan .
Computer Speech and Language, 1993, 7 (02) :101-119
[10]  
GISH H, 1993, P INT C AC SPEECH SI, P447