Segregating information about the size and shape of the vocal tract using a time-domain auditory model: The stabilised wavelet-Mellin transform

被引:79
作者
Irino, T
Patterson, RD
机构
[1] ATR, Human Informat Proc Res Labs, Kyoto 6190288, Japan
[2] Univ Cambridge, Dept Physiol, Ctr Neural Basis Hearing, Cambridge CB2 3EG, England
关键词
auditory pathway; Mellin transform; wavelet transform; stabilised auditory image; size-shape image; gammachirp auditory filter;
D O I
10.1016/S0167-6393(00)00085-6
中图分类号
O42 [声学];
学科分类号
070206 [声学]; 082403 [水声工程];
摘要
We hear vowels pronounced by men and women as approximately the same although the length of the vocal tract varies considerably from group to group. At the same time, we can identify the speaker group. This suggests that the auditory system can extract and separate information about the size of the vocal-tract from information about its shape. The duration of the impulse response of the vocal tract expands or contracts as the length of the vocal tract increases or decreases. There is a transform, the Mellin transform, that is immune to the effects of time dilation; it maps impulse responses that differ in temporal scale onto a single distribution and encodes the size information separately as a scalar constant. In this paper we investigate the use of the Mellin transform for vowel normalisation. In the auditory system, sounds are initially subjected to a form of wavelet analysis in the cochlea and then, in each frequency channel, the repeating patterns produced by periodic sounds appear to be stabilised by a form of time-interval calculation. The result is like a two-dimensional array of interval histograms and it is referred to as an auditory image. In this paper, we show that there is a two-dimensional form of the Mellin transform that can convert the auditory images of vowel sounds from vocal tracts with different sizes into an invariant Mellin image (MI) and, thereby, facilitate the extraction and separation of the size and shape information associated with a given vowel type. In signal processing terms. the MI of a sound is the Mellin transform of a stabilised wavelet transform of the sound. We suggest that the MI provides a good model of auditory vowel normalisation, and that this provides a good framework for auditory processing from cochlea to cortex. (C) 2002 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:181 / 203
页数:23
相关论文
共 35 条
[1]
FOURIER-MELLIN TRANSFORM AND MAMMALIAN HEARING [J].
ALTES, RA .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1978, 63 (01) :174-183
[2]
Acoustic correlates of talker sex and individual talker identity are present in a short vowel segment produced in running speech [J].
Bachorowski, JA ;
Owren, MJ .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1999, 106 (02) :1054-1063
[3]
BERTRAND J, 1996, TRANSFORMS APPL HDB
[4]
Frequency glides in the impulse responses of auditory-nerve fibers [J].
Carney, LH ;
McDuffy, MJ ;
Shekhter, I .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1999, 105 (04) :2384-2391
[5]
COHEN L, 1991, P SOC PHOTO-OPT INS, V1566, P109, DOI 10.1117/12.49816
[6]
THE SCALE REPRESENTATION [J].
COHEN, L .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1993, 41 (12) :3275-3292
[7]
COMBES JM, 1989, WAVELETS
[8]
COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].
DAVIS, SB ;
MERMELSTEIN, P .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366
[9]
COCHLEAR ENCODING - POTENTIALITIES AND LIMITATIONS OF REVERSE-CORRELATION TECHNIQUE [J].
DEBOER, E ;
DEJONGH, HR .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1978, 63 (01) :115-135
[10]
Fant G., 1970, Acoustic Theory of Speech Production