Semi-tied covariance matrices for hidden Markov models

被引:339
作者
Gales, MJF [1 ]
机构
[1] Univ Cambridge, Dept Engn, Cambridge CB2 1PZ, England
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 1999年 / 7卷 / 03期
关键词
correlation modeling; hidden Markov models; speech recognition;
D O I
10.1109/89.759034
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
There is normally a simple choice made in the form of the covariance matrix to be used with continuous density HMM's, Either a diagonal covariance matrix is used, with the underlying assumption that elements of the feature vector are independent, or a full or block-diagonal matrix is used, where all or some of the correlations are explicitly modeled, Unfortunately when using full or block-diagonal covariance matrices there tends to be a dramatic increase in the number of parameters per Gaussian component, limiting the number of components which may be robustly estimated. This paper introduces a new form of covariance matrix which allows a few "full" covariance matrices to be shared over many distributions, whilst each distribution maintains its own "diagonal" covariance matrix, In contrast to other schemes which have hypothesized a similar form, this technique fits within the standard maximum-likelihood criterion used for training HMM's. The ne iv form of covariance matrix is evaluated on a large-vocabulary speech-recognition task, In initial experiments the performance of the standard system was achieved using approximately half the number of parameters. Moreover, a 10% reduction in word error rate compared to a standard system can be achieved with less than a 1% increase in the number of parameters and little increase in recognition time.
引用
收藏
页码:272 / 281
页数:10
相关论文
共 22 条
[1]  
[Anonymous], 1994, P HUMAN LANG TECHN W
[2]  
Bahl L., 1991, P DARPA SPEECH NAT L, P264
[3]  
CHEN S, 1998, P BROADC NEWS TRANSC, P69
[4]   COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].
DAVIS, SB ;
MERMELSTEIN, P .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366
[5]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[6]  
Fukunaga K., 1972, Introduction to statistical pattern recognition
[7]  
Gales M. J. F., 1997, CUEDFINFENGTR291 CAM
[8]   Mean and variance adaptation within the MLLR framework [J].
Gales, MJF ;
Woodland, PC .
COMPUTER SPEECH AND LANGUAGE, 1996, 10 (04) :249-264
[9]  
GALES MJF, 1997, CUEDFINFENGTR287 CAM
[10]  
GALES MJF, 1997, CUEDFINFENGTR298 CAM