Semi-tied covariance matrices for hidden Markov models

被引：339

作者：

Gales, MJF ^{[1
]}

机构：

[1] Univ Cambridge, Dept Engn, Cambridge CB2 1PZ, England

来源：

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 1999年 / 7卷 / 03期

关键词：

correlation modeling; hidden Markov models; speech recognition;

D O I：

10.1109/89.759034

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

There is normally a simple choice made in the form of the covariance matrix to be used with continuous density HMM's, Either a diagonal covariance matrix is used, with the underlying assumption that elements of the feature vector are independent, or a full or block-diagonal matrix is used, where all or some of the correlations are explicitly modeled, Unfortunately when using full or block-diagonal covariance matrices there tends to be a dramatic increase in the number of parameters per Gaussian component, limiting the number of components which may be robustly estimated. This paper introduces a new form of covariance matrix which allows a few "full" covariance matrices to be shared over many distributions, whilst each distribution maintains its own "diagonal" covariance matrix, In contrast to other schemes which have hypothesized a similar form, this technique fits within the standard maximum-likelihood criterion used for training HMM's. The ne iv form of covariance matrix is evaluated on a large-vocabulary speech-recognition task, In initial experiments the performance of the standard system was achieved using approximately half the number of parameters. Moreover, a 10% reduction in word error rate compared to a standard system can be achieved with less than a 1% increase in the number of parameters and little increase in recognition time.

引用

页码：272 / 281

页数：10

共 22 条

[1]

[Anonymous], 1994, P HUMAN LANG TECHN W

[2]

Bahl L., 1991, P DARPA SPEECH NAT L, P264

[3]

CHEN S, 1998, P BROADC NEWS TRANSC, P69

[4] COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].

DAVIS, SB ;

MERMELSTEIN, P .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366

[5] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].

DEMPSTER, AP ;

LAIRD, NM ;

RUBIN, DB .

JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38

[6]

Fukunaga K., 1972, Introduction to statistical pattern recognition

[7]

Gales M. J. F., 1997, CUEDFINFENGTR291 CAM

[8] Mean and variance adaptation within the MLLR framework [J].

Gales, MJF ;

Woodland, PC .

COMPUTER SPEECH AND LANGUAGE, 1996, 10 (04) :249-264

[9]

GALES MJF, 1997, CUEDFINFENGTR287 CAM

[10]

GALES MJF, 1997, CUEDFINFENGTR298 CAM

← 1 2 3 →