A frequency warping approach to speaker normalization

被引：218

作者：

Lee, L ^{[1
]}

Rose, R ^{[1
]}

机构：

[1] MIT, Cambridge, MA 02139 USA

来源：

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 1998年 / 6卷 / 01期

关键词：

continuous speech recognition; frequency warping; hidden Markov modeling; speaker normalization;

D O I：

10.1109/89.650310

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In an effort to reduce the degradation in speech recognition performance caused by variation in vocal tract shape among speakers, a frequency warping approach to speaker normalization is investigated, A set of low complexity, maximum likelihood based frequency warping procedures have been applied to speaker normalization for a telephone based connected digit recognition task. This paper presents an efficient means for estimating a linear frequency warping factor and a simple mechanism for implementing frequency warping by modifying the filterbank in mel-frequency cepstrum feature analysis, An experimental study comparing these techniques to other well-known techniques for reducing variability is described, The results have shown that frequency warping is consistently able to reduce word error rate by 20% even for very short utterances.

引用

页码：49 / 60

页数：12

共 14 条

[1]

ANDREOU A, 1994, P CAIP WORKSH FRONT, V2

[2] COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].

DAVIS, SB ;

MERMELSTEIN, P .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366

[3]

FANT G, 1975, STL QPSR, V2, P1

[4] Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains [J].

Gauvain, Jean-Luc ;

Lee, Chin-Hui .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (02) :291-298

[5]

Lee L., P ICASSP 96, P353

[6] LONG-TERM FEATURE AVERAGING FOR SPEAKER RECOGNITION [J].

MARKEL, JD ;

OSHIKA, BT ;

GRAY, AH .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1977, 25 (04) :330-337

[7]

MATHAN L, P ICASSP 90, P149

[8]

ONO Y, P EUROSPEECH 93, P355

[9] DISCRETE REPRESENTATION OF SIGNALS [J].

OPPENHEIM, AV ;

JOHNSON, DH .

PROCEEDINGS OF THE INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS, 1972, 60 (06) :681-+

[10]

POTAMIANOS A, P EUROSPEECH 95

← 1 2 →