Probabilistic-trajectory segmental HMMs

被引：55

作者：

Holmes, WJ ^{[1
]}

Russell, MJ ^{[1
]}

机构：

[1] DERA Malvern, Speech Res Unit, Malvern WR14 3PS, Worcs, England

来源：

COMPUTER SPEECH AND LANGUAGE | 1999年 / 13卷 / 01期

关键词：

D O I：

10.1006/csla.1998.0048

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Segmental hidden Markov models (SHMMs) are intended to overcome important speech-modelling limitations of the conventional-HMM approach by representing sequences (or segments) of features acid incorporating the concept of trajectories to describe how features change over time. A novel feature of the approach presented in this paper is that extra-segmental variability between different examples of a sub-phonemic speech segment is modelled separately from intra-segmental variability within any one example. The extra-segmental component of the model is represented in terms of variability in the trajectory parameters, and these models are therefore referred to as "probabilistic-trajectory segmental HMMs" (PTSHMMs). This paper presents the theory of PTSHMMs using a linear trajectory description characterized by slope and mid-point parameters, and presents theoretical and experimental comparisons between different types of PTSHMMs, simpler SHMMs and conventional HMMs. Experiments have demonstrated that, for any given feature set, a linear PTSHMM can substantially reduce the error rate in comparison with a conventional HMM, both for a connected-digit recognition task and for a phonetic classification task. Performance benefits have been demonstrated from incorporating a linear trajectory description and additionally from modelling variability in the mid-point parameter. (C) 1999 British Crown Copyright/DERA.

引用

页码：3 / 37

页数：35

共 51 条

[1]

[Anonymous], P INT C AC SPEECH SI

[2]

BOURLARD H, 1995, P EUR 95 MADR, P883

[3]

BROWN PF, 1987, THESIS CARNEGIE MELL

[4]

BROWNING SR, 1991, 142 SP4 RSRE

[5] COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].

DAVIS, SB ;

MERMELSTEIN, P .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366

[6] A MARKOV MODEL CONTAINING STATE-CONDITIONED 2ND-ORDER NON-STATIONARITY - APPLICATION TO SPEECH RECOGNITION [J].

DENG, L ;

RATHINAVELU, C .

COMPUTER SPEECH AND LANGUAGE, 1995, 9 (01) :63-86

[7] Speaker-independent phonetic classification using hidden Markov models with mixtures of trend functions [J].

Deng, L ;

Aksmanovic, M .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1997, 5 (04) :319-324

[8] Speech Recognition Using Hidden Markov Models with Polynomial Regression Functions as Nonstationary States [J].

Deng, Li ;

Aksmanovic, Mike ;

Sun, Xiaodong ;

Wu, C. F. Jeff .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (04) :507-520

[9]

Digalakis V., 1989, P WORKSH SPEECH NAT, P332, DOI [10.3115/1075434.1075491., DOI 10.3115/1075434.1075491]

[10]

DIGALAKIS V, 1992, THESIS BOSTON U

← 1 2 3 4 5 6 →