SPEAKER ADAPTATION USING CONSTRAINED ESTIMATION OF GAUSSIAN MIXTURES

被引：230

作者：

DIGALAKIS, VV

RTISCHEV, D

NEUMEYER, LG

机构：

[1] SRI International, Park

来源：

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 1995年 / 3卷 / 05期

关键词：

D O I：

10.1109/89.466659

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

A recent trend in automatic speech recognition systems is the use of continuous mixture-density hidden Markov models (HMM's), Despite the good recognition performance that these systems achieve on average in large vocabulary applications, there is a large variability in performance across speakers, Performance degrades dramatically when the user is radically different from the training population, A popular technique that can improve the performance and robustness of a speech recognition system is adapting speech models to the speaker, and more generally to the channel and the task, In continuous mixture-density HMM's the number of component densities is typically very large, and it may not be feasible to acquire a sufficient amount of adaptation data for robust maximum-likelihood estimates, To solve this problem, we propose a constrained estimation technique for Gaussian mixture densities, The algorithm is evaluated on the large-vocabulary Wall Street Journal corpus for both native and nonnative speakers of American English, For nonnative speakers, the recognition error rate is approximately halved with only a small amount of adaptation data, and it approaches the speaker-independent accuracy achieved for native speakers, For native speakers, the recognition performance after adaptation improves to the accuracy of speaker-dependent systems that use six times as much training data.

引用

页码：357 / 366

页数：10

共 23 条

[1]

Anderson T.W., An Introduction to Multivariate Statistical Analysis, (1984)

[2]

Bahl L.R., Jelinek F., Mercer R.L., A maximum likelihood approach to continuous speech recognition, IEEE Trans. Pattern Anal. Machine Intell., PAMI-5, 2, pp. 179-190, (1983)

[3]

Baum L.E., Petrie T., Soules G., Weiss N., A maximization technique in the statistical analysis of probabilistic functions of finite state markov chains, Ann. Math. Statist., 41, pp. 164-171, (1970)

[4]

Bellegarda J., Et al., Robust speaker adaptation using a piecewise linear acoustic mapping, Proc. ICASSP, pp. 445-448, (1992)

[5]

Brown P., Lee C.-H., Spohrer J., Bayesian adaptation in speech recognition, Proc. ICASSP, pp. 761-764, (1983)

[6]

Choukri K., Chollet G., Grenier Y., Spectral transformations through canonical correlation analysis for speaker adaptation in ASR, Proc. ICASSP, pp. 2659-2662, (1986)

[7]

Dempster A.P., Laird N.M., Rubin D.B., Maximum likelihood estimation from incomplete data, J. Royal Statist. Soc. (B), 39, 1, pp. 1-38, (1977)

[8]

Digalalcis V., Monaco P., Murveit H., Genones: Generalized mixture tying in continuous hidden markov model-based speech rec-ognizers, submitted to IEEE Trans. Speech Audio Processing, (1994)

[9]

Furui S., Unsupervised speaker adaptation method based on hierarchical speaker clustering, Proc. ICASSP, pp. 286-289, (1989)

[10]

Huang X., Lee K.-F., On speaker-independent, speaker-dependent and speaker-adaptive speech recognition, IEEE Trans. Speech Audio Processing, 1, 2, pp. 150-157, (1993)

← 1 2 3 →