Nonuniform speaker normalization using affine transformation

被引：4

作者：

Kumar, S. V. Bharath ^{[1
]}

Umesh, S. ^{[2
]}

机构：

[1] Univ Calif San Diego, Dept Elect & Comp Engn, La Jolla, CA 92093 USA

[2] Indian Inst Technol, Dept Elect Engn, Kanpur 208016, Uttar Pradesh, India

来源：

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA | 2008年 / 124卷 / 03期

关键词：

Psychoacoustic;

D O I：

10.1121/1.2951597

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, a well-motivated nonuniform speaker normalization model that affinely relates the formant frequencies of speakers enunciating the same sound is proposed. Using the proposed affine model, the corresponding universal-warping function that is required for normalization is shown to have the same parametric form as the mel scale formula. The parameters of this universal-warping function are estimated from the vowel formant data and are shown to be close to the commonly used formula for the mel scale. This shows an interesting connection between nonuniform speaker normalization and the psychoacoustics based mel scale. In addition, the affine model fits the vowel formant data better than commonly used ad hoc normalization models. This work is motivated by a desire to improve the performance of speaker-independent speech recognition systems, where speaker normalization is conventionally done by assuming a linear-scaling relationship between spectra of speakers. The proposed affine relation is extended to describe the relationship between spectra of speakers enunciating the same sound. On a telephone-based connected digit recognition task, the proposed model provides improved recognition performance over the linear-scaling model. (C) 2008 Acoustical Society of America.

引用

页码：1727 / 1738

页数：12

共 29 条

[1]

ACERO A, 1991, INT CONF ACOUST SPEE, P893, DOI 10.1109/ICASSP.1991.150483

[2] TOWARDS AN AUDITORY THEORY OF SPEAKER NORMALIZATION [J].

BLADON, RAW ;

HENTON, CG ;

PICKERING, JB .

LANGUAGE & COMMUNICATION, 1984, 4 (01) :59-69

[3]

Eide E, 1996, INT CONF ACOUST SPEE, P346, DOI 10.1109/ICASSP.1996.541103

[4]

FANG G, 1975, NONUNIFORM VOWEL NOR

[5] ACOUSTIC CHARACTERISTICS OF AMERICAN ENGLISH VOWELS [J].

HILLENBRAND, J ;

GETTY, LA ;

CLARK, MJ ;

WHEELER, K .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1995, 97 (05) :3099-3111

[6]

HIRSCH HG, 2000, ISCA ITRW ASRU 2000

[7] A frequency warping approach to speaker normalization [J].

Lee, L ;

Rose, R .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (01) :49-60

[8]

MCDONOUGH J, 1998, P ICSLP 98 SYDN AUST

[9] AUDITORY-PERCEPTUAL INTERPRETATION OF THE VOWEL [J].

MILLER, JD .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1989, 85 (05) :2114-2134

[10]

Nearey TerranceM., 1978, Phonetic feature systems for vowels

← 1 2 3 →