Rapid speaker ID using discrete MMI feature quantisation

被引:1
作者
Foote, JT [1 ]
机构
[1] Univ Cambridge, Dept Engn, Cambridge CB2 1PZ, England
关键词
D O I
10.1016/S0957-4174(97)00051-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a method of rapidly determining speaker identity from a small sample of speech, using a tree-based vector quantiser trained to maximise mutual information (MMI). The method is text-independent and new speakers may be rapidly enrolled. Unlike most conventional hidden Markov model approaches, this method is computationally inexpensive enough to work on a modest integer microprocessor, yet is robust even with only a small amount of test data. Thus speaker identification is rapid in terms of both computational cost and the small amount of test speech necessary to identify the speaker. This paper presents theoretical and experimental results, showing that perfect ID accuracy may be achieved on a 15-speaker corpus using little more than 1 s of text-independent test speech. Also presented is demonstration of how this method may be used to segment audio data by speaker. (C) 1998 Elsevier Science Ltd. All rights reserved.
引用
收藏
页码:283 / 289
页数:7
相关论文
共 11 条
[1]  
ANIKST M, 1991, P ICASSP 91 TOR, P337
[2]  
Breiman L., 1984, Classification and Regression Trees, DOI DOI 10.2307/2530946
[3]  
CAMPBELL JP, 1995, P ICASSP, V1, P341
[4]  
FOOTE JT, 1995, P EUROSPEECH 95, V3, P2145
[5]  
FOOTE JT, 1995, P ICASSP, V1, P461
[6]  
FOOTE JT, 1994, P 1994 IEEE INT C AC, V1, P317
[7]  
Furui S., 1994, ESCA Workshop on Automatic Speaker Recognition Identification and Verification, P1
[8]  
JONES GJF, 1994, 335 CAMB U COMP LAB
[9]  
KIMBER D, 1996, P INT C SYDN
[10]  
Rabiner L., 1993, Fundamentals of Speech Recognition