Studies on inter-speaker variability in speech and its application in automatic speech recognition

被引:7
作者
Umesh, S. [1 ]
机构
[1] Indian Inst Technol, Dept Elect Engn, Madras 600036, Tamil Nadu, India
来源
SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES | 2011年 / 36卷 / 05期
关键词
Vowel-normalization; vocal-tract length normalization; speech-scale; frequency-warping; linear transformation of cepstra; speaker-adaptation; HIDDEN MARKOV-MODELS; CHILDRENS SPEECH; VOCAL-TRACT; ADAPTATION; NORMALIZATION; VOWEL; REPRESENTATION; TRANSFORMATION; CLASSIFICATION; MFCC;
D O I
10.1007/s12046-011-0049-x
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
In this paper, we give an overview of the problem of inter-speaker variability and its study in many diverse areas of speech signal processing. We first give an overview of vowel-normalization studies that minimize variations in the acoustic representation of vowel realizations by different speakers. We then describe the universal-warping approach to speaker normalization which unifies many of the vowel normalization approaches and also shows the relation between speech production, perception and auditory processing. We then address the problem of inter-speaker variability in automatic speech recognition (ASR) and describe techniques that are used to reduce these effects and thereby improve the performance of speaker-independent ASR systems.
引用
收藏
页码:853 / 883
页数:31
相关论文
共 61 条
[21]  
JASCHUL J, 1982, SPEAKER ADAPTATION L, V7, P1657
[22]  
KAMM T, 1994, P 15 ANN SPEECH RES, P175
[23]   Rapid speaker adaptation in eigenvoice space [J].
Kuhn, R ;
Junqua, JC ;
Nguyen, P ;
Niedzielski, N .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (06) :695-707
[24]   Nonuniform speaker normalization using affine transformation [J].
Kumar, S. V. Bharath ;
Umesh, S. .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2008, 124 (03) :1727-1738
[25]  
LADOFOGED P, 1957, J ACOUST SOC AM, V29, P98
[26]  
LEE CH, 1990, ICASSP 90 1990 INT C, V1, P145
[27]   A frequency warping approach to speaker normalization [J].
Lee, L ;
Rose, R .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (01) :49-60
[28]   MAXIMUM-LIKELIHOOD LINEAR-REGRESSION FOR SPEAKER ADAPTATION OF CONTINUOUS DENSITY HIDDEN MARKOV-MODELS [J].
LEGGETTER, CJ ;
WOODLAND, PC .
COMPUTER SPEECH AND LANGUAGE, 1995, 9 (02) :171-185
[29]   CLASSIFICATION OF RUSSIAN VOWELS SPOKEN BY DIFFERENT SPEAKERS [J].
LOBANOV, BM .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1971, 49 (02) :606-&
[30]   Speaker adaptation with all-pass transforms [J].
McDonough, J ;
Schaaf, T ;
Waibel, A .
SPEECH COMMUNICATION, 2004, 42 (01) :75-91