SPEAKER IDENTIFICATION AND VERIFICATION USING GAUSSIAN MIXTURE SPEAKER MODELS

被引:710
作者
REYNOLDS, DA
机构
[1] MIT Lincoln Laboratory, Lexington, MA 02173
关键词
AUTOMATIC SPEAKER IDENTIFICATION AND VERIFICATION; TEXT-INDEPENDENT; VOCABULARY-DEPENDENT; GAUSSIAN MIXTURE SPEAKER MODELS; TIMIT; NTIMIT; SWITCHBOARD; YOHO;
D O I
10.1016/0167-6393(95)00009-D
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents high performance speaker identification and verification systems based on Gaussian mixture speaker models: robust, statistically based representations of speaker identity. The identification system is a maximum likelihood classifier and the verification system is a likelihood ratio hypothesis tester using background speaker normalization. The systems are evaluated on four publically available speech databases: TIMIT, NTIMIT, Switchboard and YOHO. The different levels of degradations and variabilities found in these databases allow the examination of system performance for different task domains. Constraints on the speech range from vocabulary-dependent to extemporaneous and speech quality varies from near-ideal, clean speech to noisy, telephone speech. Closed set identification accuracies an the 630 speaker TIMIT and NTIMIT databases were 99.5% and 60.7%, respectively. On a 113 speaker population from the Switchboard database the identification accuracy was 82.8%. Global threshold equal error rates of 0.24%, 7.19%, 5.15% and 0.51% were obtained in verification experiments on the TIMIT, NTIMIT, Switchboard and YOHO databases, respectively.
引用
收藏
页码:91 / 108
页数:18
相关论文
共 28 条
[1]  
ARONS B, 1994, THESIS MIT
[2]  
BAHLER LG, 1994, 1994 P INT C AC SPEE, P321
[3]  
CAMPBELL JP, 1995, INT CONF ACOUST SPEE, P341, DOI 10.1109/ICASSP.1995.479543
[4]  
Campbell Jr. J. P., 1992, THESIS OKLAHOMA STAT
[5]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[6]   SPEAKER RECOGNITION - IDENTIFYING PEOPLE BY THEIR VOICES [J].
DODDINGTON, GR .
PROCEEDINGS OF THE IEEE, 1985, 73 (11) :1651-1664
[7]  
FISHER W, 1986, 1986 P DARPA SPEECH, P93
[8]  
FLOCH JL, 1994, 1994 P INT C AC SPEE, P149
[9]  
GILLICK L, 1993, 1993 P INT C AC SPEE, P471
[10]   Text-independent speaker identification [J].
Gish, Herbert ;
Schmidt, Michael .
IEEE SIGNAL PROCESSING MAGAZINE, 1994, 11 (04) :18-32