Speech recognition using linear dynamic models

被引：23

作者：

Frankel, Joe ^{[1
]}

King, Simon ^{[1
]}

机构：

[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh EH8 9LW, Midlothian, Scotland

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2007年 / 15卷 / 01期

基金：

英国工程与自然科学研究理事会;

关键词：

automatic speech recognition (ASR); linear dynamic models (LDMs); stack decoding;

D O I：

10.1109/TASL.2006.876766

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The majority of automatic speech recognition systems rely on hidden Markov models, in which Gaussian mixtures model the output distributions associated with sub-phone states. This approach, whilst successful, models consecutive feature vectors (augmented to include derivative information) as statistically independent. Furthermore, spatial correlations present in speech parameters are frequently ignored through the use of diagonal covariance matrices. This paper continues the work of Digalakis and others who proposed instead a first-order linear state-space model which has the capacity to model underlying dynamics, and furthermore give a model of spatial correlations. This paper examines the assumptions made in applying such a model and shows that the addition of a hidden dynamic state leads to increases in accuracy over otherwise equivalent static models. We also propose a time-asynchronous decoding strategy suited to recognition with segment models. We describe implementation of decoding for linear dynamic models and present TIMIT phone recognition results.

引用

页码：246 / 256

页数：11

共 28 条

[1]

BILMES J, P ICASSP 2000

[2]

BRIDLE JS, 1998, INVESTIGATION SEGMEN

[3] Spontaneous speech recognition using a statistical coarticulatory model for the vocal-tract-resonance dynamics [J].

Deng, L ;

Ma, J .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2000, 108 (06) :3036-3048

[4] ML Estimation of a Stochastic Linear System with the EM Algorithm and Its Application to Speech Recognition [J].

Digalakis, V. ;

Rohlicek, J. R. ;

Ostendorf, M. .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1993, 1 (04) :431-442

[5]

DIGALAKIS V, 1992, THESIS BOSTON U GRAD

[6]

Frankel J, 2003, THESIS U EDINBURGH E

[7] Mean and variance adaptation within the MLLR framework [J].

Gales, MJF ;

Woodland, PC .

COMPUTER SPEECH AND LANGUAGE, 1996, 10 (04) :249-264

[8]

ISO K, 1993, P INT C AC SPEECH SI, V2, P283

[9]

Julier S., 1996, GEN METHOD APPROXIMA

[10]

Kalman RE., 1960, J BASIC ENG, V82D, P35, DOI DOI 10.1115/1.3662552

← 1 2 3 →