A frequency warping approach to speaker normalization

被引:218
作者
Lee, L [1 ]
Rose, R [1 ]
机构
[1] MIT, Cambridge, MA 02139 USA
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 1998年 / 6卷 / 01期
关键词
continuous speech recognition; frequency warping; hidden Markov modeling; speaker normalization;
D O I
10.1109/89.650310
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In an effort to reduce the degradation in speech recognition performance caused by variation in vocal tract shape among speakers, a frequency warping approach to speaker normalization is investigated, A set of low complexity, maximum likelihood based frequency warping procedures have been applied to speaker normalization for a telephone based connected digit recognition task. This paper presents an efficient means for estimating a linear frequency warping factor and a simple mechanism for implementing frequency warping by modifying the filterbank in mel-frequency cepstrum feature analysis, An experimental study comparing these techniques to other well-known techniques for reducing variability is described, The results have shown that frequency warping is consistently able to reduce word error rate by 20% even for very short utterances.
引用
收藏
页码:49 / 60
页数:12
相关论文
共 14 条
[1]  
ANDREOU A, 1994, P CAIP WORKSH FRONT, V2
[2]   COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].
DAVIS, SB ;
MERMELSTEIN, P .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366
[3]  
FANT G, 1975, STL QPSR, V2, P1
[4]   Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains [J].
Gauvain, Jean-Luc ;
Lee, Chin-Hui .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (02) :291-298
[5]  
Lee L., P ICASSP 96, P353
[6]   LONG-TERM FEATURE AVERAGING FOR SPEAKER RECOGNITION [J].
MARKEL, JD ;
OSHIKA, BT ;
GRAY, AH .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1977, 25 (04) :330-337
[7]  
MATHAN L, P ICASSP 90, P149
[8]  
ONO Y, P EUROSPEECH 93, P355
[9]   DISCRETE REPRESENTATION OF SIGNALS [J].
OPPENHEIM, AV ;
JOHNSON, DH .
PROCEEDINGS OF THE INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS, 1972, 60 (06) :681-+
[10]  
POTAMIANOS A, P EUROSPEECH 95