Rapid speaker adaptation in eigenvoice space

被引:314
作者
Kuhn, R [1 ]
Junqua, JC [1 ]
Nguyen, P [1 ]
Niedzielski, N [1 ]
机构
[1] Panason Technol Inc, Panason Speech Technol Lab, Santa Barbara, CA 93105 USA
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 2000年 / 8卷 / 06期
关键词
eigenvoice approach; principal component analysis; speaker adaptation; speaker clustering;
D O I
10.1109/89.876308
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper describes a new model-based speaker adaptation algorithm called the eigenvoice approach. The approach constrains the adapted model to be a linear combination of a small number of basis vectors obtained offline from a set of reference speakers, and thus greatly reduces the number of free parameters to be estimated from adaptation data. These "eigenvoice" basis vectors are orthogonal to each other and guaranteed to represent the most important components of variation between the reference speakers. Experimental results for a small-vocabulary task (letter recognition) given in the paper show that the approach yields major improvements in performance for tiny amounts of adaptation data. For instance, we obtained 16% relative improvement in error rate with one letter of supervised adaptation data, and 26% relative improvement with four letters of supervised adaptation data. After a comparison of the eigenvoice approach with other speaker adaptation algorithms, the paper concludes with a discussion of future work.
引用
收藏
页码:695 / 707
页数:13
相关论文
共 42 条
[1]  
Acero A, 1996, INT CONF ACOUST SPEE, P342, DOI 10.1109/ICASSP.1996.541102
[2]   Combined Bayesian and predictive techniques for rapid speaker adaptation of continuous density hidden Markov models [J].
Ahadi, SM ;
Woodland, PC .
COMPUTER SPEECH AND LANGUAGE, 1997, 11 (03) :187-206
[3]  
AHADISARKANI S, 1996, THESIS CAMBRIDGE U C
[4]  
ANASTAKOS T, 1997, INT C AC SPEECH SIGN, V2, P1043
[5]  
Anastasakos T, 1996, ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, P1137, DOI 10.1109/ICSLP.1996.607807
[6]  
[Anonymous], P INT C SPOK LANG PR
[7]  
ATICK J, 1996, NEURAL COMPUT
[8]  
BRIDLE J, 1983, 1018 JOINT SPEECH RE
[9]  
COLE R, ISOLET SPOKEN LETTER
[10]   PREDICTIVE SPEAKER ADAPTATION IN SPEECH RECOGNITION [J].
COX, S .
COMPUTER SPEECH AND LANGUAGE, 1995, 9 (01) :1-17