The NIST 1999 Speaker Recognition Evaluation - An overview

被引：74

作者：

Martin, A ^{[1
]}

Przybocki, M ^{[1
]}

机构：

[1] NIST, Gaithersburg, MD 20899 USA

来源：

DIGITAL SIGNAL PROCESSING | 2000年 / 10卷 / 1-3期

关键词：

speaker recognition; speaker verification; speaker detection; speaker tracking; DET curve; NIST evaluation;

D O I：

10.1006/dspr.1999.0355

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This article summarizes the 1999 NIST Speaker Recognition Evaluation. It discusses the overall research objectives, the three task definitions, the development and evaluation data sets, the specified performance measures and their manner of presentation, the overall quality of the results. More than a dozen sites from the United States, Europe, and Asia participated in this evaluation. There were three primary tasks for which automatic systems could be designed: one-speaker detection, two-speaker detection, and speaker tracking. All three tasks were performed in the context of mu-law encoded conversational telephone speech. The one-speaker detection task used single channel data, while the other two tasks used summed two-channel data. About 500 target speakers were specified, with 2 min of training speech data provided for each. Both multiple and single speaker test segments were selected from about 2000 conversations that were not used for training material. The duration of the multiple speaker test data was nominally 1 min, while the duration of the single speaker test segments varied from near zero up to 60 s. For each task, systems had to make independent decisions for selected combinations of a test segment and a hypothesized target speaker. The data sets for each task were designed to be large enough to provide statistically meaningful results on test subsets of interest. Results were analyzed with respect to various conditions including duration, pitch differences, and handset types. (C) 2000 Academic Press.

引用

页码：1 / 18

页数：18

共 7 条

[1] The ELISA Systems for the NIST'99 evaluation in speaker detection and tracking [J].

Bimbot, F ;

Blouet, R ;

Bonastre, JF ;

Caloz, G ;

Cernocky, J ;

Chollet, G ;

Durou, G ;

Fredouille, C ;

Genoud, D ;

Gravier, G ;

Hennebert, J ;

Kharroubi, J ;

Magrin-Chagnolleau, I ;

Merlin, T ;

Mokbel, C ;

Nedic, B ;

Petrovska-Delacrétaz, D ;

Pigeon, S ;

Seck, M ;

Verlinde, P ;

Zouhal, M .

DIGITAL SIGNAL PROCESSING, 2000, 10 (1-3) :143-153

[2]

DODDINGTON G, IN PRESS SPEECH COMM

[3] Approaches to speaker detection and tracking in conversational speech [J].

Dunn, RB ;

Reynolds, DA ;

Quatieri, TF .

DIGITAL SIGNAL PROCESSING, 2000, 10 (1-3) :93-112

[4]

Martin A., 1997, P EUR C SPEECH COMM, P1895

[5]

PRZYBOCKI M, 1998, RLA2C PRES AV APR

[6]

PRZYBOCKI M, 1998, RLA2C AV APR, P120

[7]

Quatieri TF, 1998, INT CONF ACOUST SPEE, P745, DOI 10.1109/ICASSP.1998.675372

← 1 →