Approaches to speaker detection and tracking in conversational speech

被引:25
作者
Dunn, RB [1 ]
Reynolds, DA [1 ]
Quatieri, TF [1 ]
机构
[1] MIT, Lincoln Lab, Speech Syst Technol Grp, Lexington, MA 02173 USA
关键词
speaker recognition; detection; tracking; multispeaker; Gaussian mixture model; clustering;
D O I
10.1006/dspr.1999.0359
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Two approaches to detecting and tracking speakers in multispeaker audio are described. Both approaches use an adapted Gaussian mixture model, universal background model (GMM-UBM) speaker detection system as the core speaker recognition engine. In one approach, the individual log-likelihood ratio scores, which are produced on a frame-by-frame basis by the GMM-UBM system, are used to first partition the speech file into speaker homogenous regions and then to create scores for these regions. We refer to this approach as internal segmentation. Another approach uses an external segmentation algorithm, based on blind clustering, to partition the speech file into speaker homogenous regions. The adapted GMM-UBM system then scores each of these regions as in the single-speaker recognition case. We show that the external segmentation system outperforms the internal segmentation system for both detection and tracking. In addition, we show how different components of the detection and tracking algorithms contribute to the overall system performance. (C) 2000 Academic Press.
引用
收藏
页码:93 / 112
页数:20
相关论文
共 15 条
[1]  
A Reynolds D., 1992, GAUSSIAN MIXTURE MOD
[2]  
[Anonymous], 1997, Proceedings of the uropean Conference on Speech Communication and Technology
[3]  
Chen SS, 1998, P DARPA BROADC NEWS
[4]  
*DARPA, 1998, DARPA BROADC NEWS TR
[5]  
GISH H, 1991, P INT C AC SPEECH SI
[6]  
GISH H, 1994, P INT C AC SPEECH SI
[7]  
Hermansky H., 1992, P INT C AC SPEECH SI
[8]  
JIN H, 1998, P DARPA BROADC NEWS
[9]   The NIST 1999 Speaker Recognition Evaluation - An overview [J].
Martin, A ;
Przybocki, M .
DIGITAL SIGNAL PROCESSING, 2000, 10 (1-3) :1-18
[10]   Speaker verification using adapted Gaussian mixture models [J].
Reynolds, DA ;
Quatieri, TF ;
Dunn, RB .
DIGITAL SIGNAL PROCESSING, 2000, 10 (1-3) :19-41