Probabilistic-trajectory segmental HMMs

被引:55
作者
Holmes, WJ [1 ]
Russell, MJ [1 ]
机构
[1] DERA Malvern, Speech Res Unit, Malvern WR14 3PS, Worcs, England
关键词
D O I
10.1006/csla.1998.0048
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Segmental hidden Markov models (SHMMs) are intended to overcome important speech-modelling limitations of the conventional-HMM approach by representing sequences (or segments) of features acid incorporating the concept of trajectories to describe how features change over time. A novel feature of the approach presented in this paper is that extra-segmental variability between different examples of a sub-phonemic speech segment is modelled separately from intra-segmental variability within any one example. The extra-segmental component of the model is represented in terms of variability in the trajectory parameters, and these models are therefore referred to as "probabilistic-trajectory segmental HMMs" (PTSHMMs). This paper presents the theory of PTSHMMs using a linear trajectory description characterized by slope and mid-point parameters, and presents theoretical and experimental comparisons between different types of PTSHMMs, simpler SHMMs and conventional HMMs. Experiments have demonstrated that, for any given feature set, a linear PTSHMM can substantially reduce the error rate in comparison with a conventional HMM, both for a connected-digit recognition task and for a phonetic classification task. Performance benefits have been demonstrated from incorporating a linear trajectory description and additionally from modelling variability in the mid-point parameter. (C) 1999 British Crown Copyright/DERA.
引用
收藏
页码:3 / 37
页数:35
相关论文
共 51 条
[1]  
[Anonymous], P INT C AC SPEECH SI
[2]  
BOURLARD H, 1995, P EUR 95 MADR, P883
[3]  
BROWN PF, 1987, THESIS CARNEGIE MELL
[4]  
BROWNING SR, 1991, 142 SP4 RSRE
[5]   COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].
DAVIS, SB ;
MERMELSTEIN, P .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366
[6]   A MARKOV MODEL CONTAINING STATE-CONDITIONED 2ND-ORDER NON-STATIONARITY - APPLICATION TO SPEECH RECOGNITION [J].
DENG, L ;
RATHINAVELU, C .
COMPUTER SPEECH AND LANGUAGE, 1995, 9 (01) :63-86
[7]   Speaker-independent phonetic classification using hidden Markov models with mixtures of trend functions [J].
Deng, L ;
Aksmanovic, M .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1997, 5 (04) :319-324
[8]   Speech Recognition Using Hidden Markov Models with Polynomial Regression Functions as Nonstationary States [J].
Deng, Li ;
Aksmanovic, Mike ;
Sun, Xiaodong ;
Wu, C. F. Jeff .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (04) :507-520
[9]  
Digalakis V., 1989, P WORKSH SPEECH NAT, P332, DOI [10.3115/1075434.1075491., DOI 10.3115/1075434.1075491]
[10]  
DIGALAKIS V, 1992, THESIS BOSTON U