MAXIMUM-LIKELIHOOD LINEAR-REGRESSION FOR SPEAKER ADAPTATION OF CONTINUOUS DENSITY HIDDEN MARKOV-MODELS

被引:1399
作者
LEGGETTER, CJ
WOODLAND, PC
机构
[1] Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, Trumpington Street
关键词
D O I
10.1006/csla.1995.0010
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A method of speaker adaptation for continuous density hidden Markov models (HMMs) is presented. An initial speaker-independent system is adapted to improve the modelling of a new speaker by updating the HMM parameters. Statistics are gathered from the available adaptation data and used to calculate a linear regression-based transformation for the mean vectors. The transformation matrices ale calculated to maximize the likelihood of the adaptation data and can be implemented using the forward-backward algorithm. By tying the transformations among a number of distributions, adaptation can be performed for distributions which are not represented in the training data. An important feature of the method is that arbitrary adaptation data can be used-no special enrolment sentences are needed. Experiments have been performed on the ARPA RM1 database using an HMM system with cross-word triphones and mixture Gaussian output distributions. Results show that adaptation can be performed using as little as 11 s of adaptation data, and that as more data is used the adaptation performance improves. For example, using 40 adaptation utterances, a 37% reduction in error from the speaker-independent system was achieved with supervised adaptation and a 32% reduction in unsupervised mode.
引用
收藏
页码:171 / 185
页数:15
相关论文
共 17 条
[1]  
Baum L. E., 1972, INEQUALITIES, V3, P1
[2]  
BELLEGARDA JR, 1992, P IEEE INT C AC SPEE, V1, P445
[3]  
CHOUKRI K, 1986, P ICASSP, V4, P2659
[4]  
CLASS F, 1990, P ICASSP 90, V1, P133
[5]  
COX SJ, 1989, P INT C ACOUST SPEEC, V1, P294
[6]  
DIGALAKIS V, 1995, UNPUB IEEE T SPEECH
[7]   Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains [J].
Gauvain, Jean-Luc ;
Lee, Chin-Hui .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (02) :291-298
[8]  
Hewett A.J., 1989, THESIS CAMBRIDGE U
[9]  
JASCHUL J, 1982, P ICASSP, V3, P1657
[10]  
KENDALL MG, 1971, ADV THEORY STATISTIC, V2