Studies on inter-speaker variability in speech and its application in automatic speech recognition

被引:7
作者
Umesh, S. [1 ]
机构
[1] Indian Inst Technol, Dept Elect Engn, Madras 600036, Tamil Nadu, India
来源
SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES | 2011年 / 36卷 / 05期
关键词
Vowel-normalization; vocal-tract length normalization; speech-scale; frequency-warping; linear transformation of cepstra; speaker-adaptation; HIDDEN MARKOV-MODELS; CHILDRENS SPEECH; VOCAL-TRACT; ADAPTATION; NORMALIZATION; VOWEL; REPRESENTATION; TRANSFORMATION; CLASSIFICATION; MFCC;
D O I
10.1007/s12046-011-0049-x
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
In this paper, we give an overview of the problem of inter-speaker variability and its study in many diverse areas of speech signal processing. We first give an overview of vowel-normalization studies that minimize variations in the acoustic representation of vowel realizations by different speakers. We then describe the universal-warping approach to speaker normalization which unifies many of the vowel normalization approaches and also shows the relation between speech production, perception and auditory processing. We then address the problem of inter-speaker variability in automatic speech recognition (ASR) and describe techniques that are used to reduce these effects and thereby improve the performance of speaker-independent ASR systems.
引用
收藏
页码:853 / 883
页数:31
相关论文
共 61 条
[1]  
ACERO A, 1991, P IEEE ICASSP TOR CA, P893
[2]  
ACERO A, 1990, THESIS CARNEGIE MELL
[3]  
ADANK P, 2004, J ACOUST SOC AM, V116, P1
[4]  
ADANK PM, 2003, THESIS U NIJMEGEN TH
[5]  
AKHIL PT, 2008, P INTERSPEECH, P1713
[6]  
Anastasakos T., 1996, P INT C SPOK LANG PR
[7]  
Andreou A., 1994, P CAIP WORKSH FRONT
[8]   TOWARDS AN AUDITORY THEORY OF SPEAKER NORMALIZATION [J].
BLADON, RAW ;
HENTON, CG ;
PICKERING, JB .
LANGUAGE & COMMUNICATION, 1984, 4 (01) :59-69
[9]  
CHOUKRI K, 1986, SPECTRAL TRANSFORMAT, V11, P2659
[10]   A novel feature transformation for vocal tract length normalization in automatic speech recognition [J].
Claes, T ;
Dologlou, I ;
ten Bosch, L ;
Van Compernolle, D .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (06) :549-557