Discriminative keyword spotting

被引：74

作者：

Keshet, Joseph ^{[1
]}

Grangier, David ^{[2
]}

Bengio, Samy ^{[3
]}

机构：

[1] IDIAP Res Inst, CH-1920 Martigny, Switzerland

[2] NEC Labs Amer, Princeton, NJ 08540 USA

[3] Google Inc, Mountain View, CA 94043 USA

来源：

SPEECH COMMUNICATION | 2009年 / 51卷 / 04期

关键词：

Keyword spotting; Spoken term detection; Speech recognition; Large margin and kernel methods; Support vector machines; Discriminative models;

D O I：

10.1016/j.specom.2008.10.002

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper proposes a new approach for keyword spotting, which is based on large margin and kernel methods rather than on HMMs. Unlike previous approaches, the proposed method employs a discriminative learning procedure, in which the learning phase aims at achieving a high area under the ROC curve, as this quantity is the most common measure to evaluate keyword spotters. The keyword spotter we devise is based oil mapping the input acoustic representation of the speech utterance along with the target keyword into a vector-space. Building on techniques used for large margin and kernel methods for predicting whole sequences, our keyword spotter distills to a classifier in this vector-space, which separates speech utterances in which the keyword is uttered from speech utterances in which the keyword is not uttered. We describe a simple iterative algorithm for training the keyword spotter and discuss its formal properties, showing theoretically that it attains high area under the ROC curve. Experiments on read speech with the TIMIT corpus show that the resulted discriminative system outperforms the conventional context-independent HMM-based system. Further experiments using the TIMIT trained model, but tested oil both read (HTIMIT, WSJ) and spontaneous speech (OGI Stories), show that without further training or adaptation to the new corpus our discriminative system outperforms the conventional context-independent HMM-based system. (C) 2008 Elsevier B.V. All rights reserved.

引用

页码：317 / 329

页数：13

共 35 条

[1]

[Anonymous], P ICASSP

[2]

[Anonymous], P ICASSP

[3]

[Anonymous], 1990, SUPPORT VECTOR LEARN

[4]

Bahl L., 1986, INT C ACOUSTICS SPEE, P49

[5]

BENAYED Y, 2004, P INT C AUD SPEECH S, P588

[6]

BENGIO S, 2005, P 22 INT C MACH LEAR

[7]

BOURLARD H, 1994, P IEEE INT C AC SPEE, P373

[8] Phonetic searching vs. LVCSR: How to find what you really want in audio archives [J].

Cardillo P.S. ;

Clements M. ;

Miller M.S. .

International Journal of Speech Technology, 2002, 5 (1) :9-22

[9] On the generalization ability of on-line learning algorithms [J].

Cesa-Bianchi, N ;

Conconi, A ;

Gentile, C .

IEEE TRANSACTIONS ON INFORMATION THEORY, 2004, 50 (09) :2050-2057

[10]

COLLOBERT R, 2002, 46 IDIAPRR

← 1 2 3 4 →