Mean and variance adaptation within the MLLR framework

被引：251

作者：

Gales, MJF

Woodland, PC

机构：

[1] Cambridge Univ. Eng. Department, Cambridge CB2 1PZ, Trumpington Street

来源：

COMPUTER SPEECH AND LANGUAGE | 1996年 / 10卷 / 04期

关键词：

D O I：

10.1006/csla.1996.0013

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

One of the key issues for adaptation algorithms is to modify a large number of parameters with only a small amount of adaptation data. Speaker adaptation techniques try to obtain near speaker-dependent (SD) performance with only small amounts of speaker-specific data, and are often based on initial speaker-independent (SI) recognition systems. Some of these speaker adaptation techniques may also be applied to the task of adaptation to a new acoustic environment. In this case an SI recognition system trained in, typically, a dean acoustic environment is adapted to operate in a new, noise-corrupted, acoustic environment. This paper examines the maximum likelihood linear regression (MLLR) adaptation technique. MLLR estimates linear transformations for groups of model parameters to maximize the likelihood of the adaptation data. Previously, MLLR has been applied to the mean parameters in mixture-Gaussian HMM systems. In this paper MLLR is extended to also update the Gaussian variances and re-estimation formulae are derived for these variance transforms. MLLR with variance compensation is evaluated on several large vocabulary recognition tasks. The use of mean and variance MLLR adaptation was found to give an additional 2% to 7% decrease in word error rate over mean-only MLLR adaptation. (C) 1996 Academic Press Limited

引用

页码：249 / 264

页数：16

共 23 条

[1] [Anonymous], P 1995 ARPA HUM LANG
[2] [Anonymous], 1994, P HUMAN LANG TECHN W
[3] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM
DEMPSTER, AP
LAIRD, NM
RUBIN, DB
[J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01): : 1 - 38
[4] SPEAKER ADAPTATION USING CONSTRAINED ESTIMATION OF GAUSSIAN MIXTURES
DIGALAKIS, VV
RTISCHEV, D
NEUMEYER, LG
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1995, 3 (05): : 357 - 366
[5] GALES M, 1996, THESIS CAMBRIDGE U C
[6] ROBUST SPEECH RECOGNITION IN ADDITIVE AND CONVOLUTIONAL NOISE USING PARALLEL MODEL COMBINATION
GALES, MJF
YOUNG, SJ
[J]. COMPUTER SPEECH AND LANGUAGE, 1995, 9 (04) : 289 - 307
[7] GALES MJF, 1996, CUEDFINFENGTR242 CAM
[8] GALES MJF, 1995, P ICASSP, V1, P133
[9] Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains
Gauvain, Jean-Luc
Lee, Chin-Hui
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (02): : 291 - 298
[10] HEWETT AJ, 1989, THESIS CAMBRIGE U CA

← 1 2 3 →