Approaches to speaker detection and tracking in conversational speech

被引：25

作者：

Dunn, RB ^{[1
]}

Reynolds, DA ^{[1
]}

Quatieri, TF ^{[1
]}

机构：

[1] MIT, Lincoln Lab, Speech Syst Technol Grp, Lexington, MA 02173 USA

来源：

DIGITAL SIGNAL PROCESSING | 2000年 / 10卷 / 1-3期

关键词：

speaker recognition; detection; tracking; multispeaker; Gaussian mixture model; clustering;

D O I：

10.1006/dspr.1999.0359

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Two approaches to detecting and tracking speakers in multispeaker audio are described. Both approaches use an adapted Gaussian mixture model, universal background model (GMM-UBM) speaker detection system as the core speaker recognition engine. In one approach, the individual log-likelihood ratio scores, which are produced on a frame-by-frame basis by the GMM-UBM system, are used to first partition the speech file into speaker homogenous regions and then to create scores for these regions. We refer to this approach as internal segmentation. Another approach uses an external segmentation algorithm, based on blind clustering, to partition the speech file into speaker homogenous regions. The adapted GMM-UBM system then scores each of these regions as in the single-speaker recognition case. We show that the external segmentation system outperforms the internal segmentation system for both detection and tracking. In addition, we show how different components of the detection and tracking algorithms contribute to the overall system performance. (C) 2000 Academic Press.

引用

页码：93 / 112

页数：20

共 15 条

[1]

A Reynolds D., 1992, GAUSSIAN MIXTURE MOD

[2]

[Anonymous], 1997, Proceedings of the uropean Conference on Speech Communication and Technology

[3]

Chen SS, 1998, P DARPA BROADC NEWS

[4]

*DARPA, 1998, DARPA BROADC NEWS TR

[5]

GISH H, 1991, P INT C AC SPEECH SI

[6]

GISH H, 1994, P INT C AC SPEECH SI

[7]

Hermansky H., 1992, P INT C AC SPEECH SI

[8]

JIN H, 1998, P DARPA BROADC NEWS

[9] The NIST 1999 Speaker Recognition Evaluation - An overview [J].

Martin, A ;

Przybocki, M .

DIGITAL SIGNAL PROCESSING, 2000, 10 (1-3) :1-18

[10] Speaker verification using adapted Gaussian mixture models [J].

Reynolds, DA ;

Quatieri, TF ;

Dunn, RB .

DIGITAL SIGNAL PROCESSING, 2000, 10 (1-3) :19-41

← 1 2 →