Discounted likelihood linear regression for rapid speaker adaptation

被引:3
作者
Gunawardana, A [1 ]
Byrne, W [1 ]
机构
[1] Johns Hopkins Univ, Dept Elect & Comp Engn, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
关键词
D O I
10.1006/csla.2000.0151
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The widely used maximum likelihood linear regression speaker adaptation procedure suffers from overtraining when used for rapid adaptation tasks in which the amount of adaptation data is severely limited. This is a well known difficulty associated with the expectation maximization algorithm. We use an information geometric analysis of the expectation maximization algorithm as an alternating minimization of a Kullback-Leibler-type divergence to see the cause of this difficulty, and propose a more robust discounted likelihood estimation procedure. This gives rise to a discounted likelihood linear regression procedure, which is a variant of maximum likelihood linear regression suited for small adaptation sets. Our procedure is evaluated on an unsupervised rapid adaptation task defined on the Switchboard conversational telephone speech corpus, where our proposed procedure improves word error rate by 1.6% (absolute) with as little as 5 seconds of adaptation data, which is a situation in which maximum likelihood linear regression overtrains in the first iteration of adaptation. We compare several realizations of discounted likelihood linear regression with maximum likelihood linear regression and other simple maximum likelihood linear regression variants, and discuss issues that arise in implementing our discounted likelihood procedures. (C) 2001 Academic Press.
引用
收藏
页码:15 / 38
页数:24
相关论文
共 15 条
[1]   Information geometry of the EM and em algorithms for neural networks [J].
Amari, SI .
NEURAL NETWORKS, 1995, 8 (09) :1379-1408
[2]  
BYRNE WJ, 1993, IEEE SP WORKSH NEUR
[3]  
Chou W., 1999, EUROSPEECH, V1, P1
[4]  
Csiszar I., 1984, STATISTICS DECISIO S, V1, P205
[5]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[6]  
DIGALAKIS V, 1999, IEEE INT C AC SPEECH
[7]   Speaker adaptation using combined transformation and Bayesian methods [J].
Digalakis, VV ;
Neumeyer, LG .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1996, 4 (04) :294-300
[8]   SPEAKER ADAPTATION USING CONSTRAINED ESTIMATION OF GAUSSIAN MIXTURES [J].
DIGALAKIS, VV ;
RTISCHEV, D ;
NEUMEYER, LG .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1995, 3 (05) :357-366
[9]   DEFINING CURVATURE OF A STATISTICAL PROBLEM (WITH APPLICATIONS TO 2ND ORDER EFFICIENCY) [J].
EFRON, B .
ANNALS OF STATISTICS, 1975, 3 (06) :1189-1217
[10]   ON THE RELATIONS BETWEEN MODELING APPROACHES FOR SPEECH RECOGNITION [J].
EPHRAIM, Y ;
RABINER, LR .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1990, 36 (02) :372-380