Structured speech modeling

被引:48
作者
Deng, Li [1 ]
Yu, Dong [1 ]
Acero, Alex [1 ]
机构
[1] Microsoft Corp, Redmond, WA 98052 USA
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2006年 / 14卷 / 05期
关键词
hidden dynamics; hidden trajectory; long span modeling; maximum-likelihood; nonlinear prediction; parameter learning; structured modeling; vocal tract resonance;
D O I
10.1109/TASL.2006.878265
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Modeling dynamic structure of speech is a novel paradigm in speech recognition research within the generative modeling framework, and it offers a potential to overcome limitations of the current hidden Markov modeling approach. Analogous to structured language models where syntactic structure is exploited to represent long-distance relationships among words [5], the structured speech model described in this paper makes use of the dynamic structure in the hidden vocal tract resonance space to characterize long-span contextual influence among phonetic units. A general overview is provided first on hierarchically classified types of dynamic speech models in the literature. A detailed account is then given for a specific model type called the hidden trajectory model, and we describe detailed steps of model construction and the parameter estimation algorithms. We show how the use of resonance target parameters and their temporal filtering enables joint modeling of long-span coarticulation and phonetic reduction effects. Experiments on phonetic recognition evaluation demonstrate superior recognizer performance over a modern hidden Markov model-based system. Error analysis shows that the greatest performance gain occurs within the sonorant speech class.
引用
收藏
页码:1492 / 1504
页数:13
相关论文
共 59 条
[1]   MODELING OF CONTEXTUAL EFFECTS BASED ON SPECTRAL PEAK INTERACTION [J].
AKAGI, M .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1993, 93 (02) :1076-1086
[2]  
[Anonymous], 2005, INTERSPEECH 2005
[3]  
[Anonymous], P INTERSPEECH
[4]  
[Anonymous], P IEEE INT C AC SPEE
[5]  
Atal B. S., 1983, Proceedings of ICASSP 83. IEEE International Conference on Acoustics, Speech and Signal Processing, P81
[6]   Graphical model architectures for speech recognition [J].
Bilmes, JA ;
Bartels, C .
IEEE SIGNAL PROCESSING MAGAZINE, 2005, 22 (05) :89-100
[7]  
BRIDLE J, 1998, FINAL REPORT 1998 WO, P1
[8]   Structured language modeling [J].
Chelba, C ;
Jelinek, F .
COMPUTER SPEECH AND LANGUAGE, 2000, 14 (04) :283-332
[9]   Production models as a structural basis for automatic speech recognition [J].
Deng, L ;
Ramsay, G ;
Sun, D .
SPEECH COMMUNICATION, 1997, 22 (2-3) :93-111
[10]   Spontaneous speech recognition using a statistical coarticulatory model for the vocal-tract-resonance dynamics [J].
Deng, L ;
Ma, J .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2000, 108 (06) :3036-3048