SPEAKER ADAPTATION USING CONSTRAINED ESTIMATION OF GAUSSIAN MIXTURES

被引:230
作者
DIGALAKIS, VV
RTISCHEV, D
NEUMEYER, LG
机构
[1] SRI International, Park
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 1995年 / 3卷 / 05期
关键词
D O I
10.1109/89.466659
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A recent trend in automatic speech recognition systems is the use of continuous mixture-density hidden Markov models (HMM's), Despite the good recognition performance that these systems achieve on average in large vocabulary applications, there is a large variability in performance across speakers, Performance degrades dramatically when the user is radically different from the training population, A popular technique that can improve the performance and robustness of a speech recognition system is adapting speech models to the speaker, and more generally to the channel and the task, In continuous mixture-density HMM's the number of component densities is typically very large, and it may not be feasible to acquire a sufficient amount of adaptation data for robust maximum-likelihood estimates, To solve this problem, we propose a constrained estimation technique for Gaussian mixture densities, The algorithm is evaluated on the large-vocabulary Wall Street Journal corpus for both native and nonnative speakers of American English, For nonnative speakers, the recognition error rate is approximately halved with only a small amount of adaptation data, and it approaches the speaker-independent accuracy achieved for native speakers, For native speakers, the recognition performance after adaptation improves to the accuracy of speaker-dependent systems that use six times as much training data.
引用
收藏
页码:357 / 366
页数:10
相关论文
共 23 条
[1]  
Anderson T.W., An Introduction to Multivariate Statistical Analysis, (1984)
[2]  
Bahl L.R., Jelinek F., Mercer R.L., A maximum likelihood approach to continuous speech recognition, IEEE Trans. Pattern Anal. Machine Intell., PAMI-5, 2, pp. 179-190, (1983)
[3]  
Baum L.E., Petrie T., Soules G., Weiss N., A maximization technique in the statistical analysis of probabilistic functions of finite state markov chains, Ann. Math. Statist., 41, pp. 164-171, (1970)
[4]  
Bellegarda J., Et al., Robust speaker adaptation using a piecewise linear acoustic mapping, Proc. ICASSP, pp. 445-448, (1992)
[5]  
Brown P., Lee C.-H., Spohrer J., Bayesian adaptation in speech recognition, Proc. ICASSP, pp. 761-764, (1983)
[6]  
Choukri K., Chollet G., Grenier Y., Spectral transformations through canonical correlation analysis for speaker adaptation in ASR, Proc. ICASSP, pp. 2659-2662, (1986)
[7]  
Dempster A.P., Laird N.M., Rubin D.B., Maximum likelihood estimation from incomplete data, J. Royal Statist. Soc. (B), 39, 1, pp. 1-38, (1977)
[8]  
Digalalcis V., Monaco P., Murveit H., Genones: Generalized mixture tying in continuous hidden markov model-based speech rec-ognizers, submitted to IEEE Trans. Speech Audio Processing, (1994)
[9]  
Furui S., Unsupervised speaker adaptation method based on hierarchical speaker clustering, Proc. ICASSP, pp. 286-289, (1989)
[10]  
Huang X., Lee K.-F., On speaker-independent, speaker-dependent and speaker-adaptive speech recognition, IEEE Trans. Speech Audio Processing, 1, 2, pp. 150-157, (1993)