A STUDY ON SPEAKER ADAPTATION OF THE PARAMETERS OF CONTINUOUS DENSITY HIDDEN MARKOV-MODELS

被引:132
作者
LEE, CH
LIN, CH
JUANG, BH
机构
[1] Speech Research Department, AT&T Bell Laboratories, NJ 07974, Murray Hill
[2] Telecommunication Laboratories, Chung-Li
关键词
D O I
10.1109/78.80902
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
It is generally agreed that, for a given speech recognition task, a speaker-dependent system usually outperforms a speaker-independent system, as long as a sufficient amount of training data is available. When the amount of speaker-specific training data is limited, however, such a performance gain is not guaranteed. One way to improve the performance is to make use of existing knowledge, contained in a rich speaker-independent (or multispeaker) data base, so that a minimum amount of training data is sufficient to model the new speaker. Such a training procedure is often referred to as speaker adaptation when a priori knowledge is derived from a speaker-independent (or multispeaker) data base; and as speaker conversion when the knowledge is derived from a different speaker. We mainly address the speaker adaptation issue here. For a speech recognition system based on continous density hidden Markov models (CDHMM), speaker adaptation of the parameters of CDHMM is formulated as a Bayesian learning procedure. In this study we present a speaker adaptation procedure which is easily integrated into the segmental kappa-means training procedure for obtaining adaptive estimates of the CDHMM parameters. We report on some results for adapting both the mean and the diagonal covariance matrix of the Gaussian state observation densities of a CDHMM. When testing on a 39-word English alpha-digit vocabulary in isolated word mode, the results indicate that the speaker adaptation procedure achieves the same level of performance of a speaker-independent system, when one training token from each word is used to perform speaker adaptation. It also shows that much better performance is achieved when two or more training tokens are used for speaker adaptation. When compared with the speaker-dependent system, we found that the performance of speaker adaptation is always equal to or better than that of speaker-dependent training using the same amount of training data.
引用
收藏
页码:806 / 814
页数:9
相关论文
共 16 条
[1]  
BROWN PF, 1983, APR P ICASSP83 BOST, P761
[2]  
DeGroot, 1970, OPTIMAL STAT DECISIO, V82
[3]  
Duda R. O., 1973, PATTERN CLASSIFICATI, V3
[4]  
FURUI S, 1989, MAY P INT C AC SPEEC, P286
[5]  
JUANG BH, 1987, IEEE T ACOUST SPEECH, V35, P947, DOI 10.1109/TASSP.1987.1165237
[6]  
LIPPMAN RP, 1987, APR P IEEE INT C AC, P705
[7]  
Rabiner L. R., 1987, Computer Speech and Language, V2, P343, DOI 10.1016/0885-2308(87)90016-7
[8]   A SEGMENTAL K-MEANS TRAINING PROCEDURE FOR CONNECTED WORD RECOGNITION [J].
RABINER, LR ;
WILPON, JG ;
JUANG, BH .
AT&T TECHNICAL JOURNAL, 1986, 65 (03) :21-31
[9]  
RABINER LR, 1986, IEEE ASSP MAG, V3, P4
[10]  
RABINER LR, 1988, APR P ICASSP88 NEW Y, P119