Missing-data model of vowel identification

被引:31
作者
de Cheveigné, A
Kawahara, H
机构
[1] Univ Paris 07, CNRS, Lab Linguist Formelle, F-75251 Paris, France
[2] ATR Human Informat Proc Res Labs, Seika, Kyoto 6190288, Japan
[3] Wakayama Univ, CREST, Fac Syst Engn, Design Informat Sci Dept,Media Design Informat Gr, Wakayama 6408510, Japan
关键词
D O I
10.1121/1.424675
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Vowel identity correlates well with the shape of the transfer function of the vocal tract, in particular the position of the first two or three formant peaks. However, in voiced speech the transfer function is sampled at multiples of the fundamental frequency (F-0), and the short-term spectrum contains peaks at those frequencies, rather than at formants. It is not clear how the auditory system estimates the original spectral envelope from the vowel waveform. Cochlear excitation patterns, for example, resolve harmonics in the low-frequency region and their shape varies strongly with F-0. The problem cannot be cured by smoothing: lag-domain components of the spectral envelope are aliased and cause F-0-dependent distortion. The problem is severe at high F-0's where the spectral envelope is severely undersampled. This paper treats vowel identification as a process of pattern recognition with missing data. Matching is restricted to available data, and missing data are ignored using an F-0-dependent weighting function that emphasizes regions near harmonics. The model is presented in two versions: a frequency-domain version based on short-term spectra, or tonotopic excitation patterns, and a time-domain version based on autocorrelation functions. It accounts for the relative F-0-independency observed in vowel identification. (C) 1999 Acoustical Society of America. [S0001-4966(99)00906-6].
引用
收藏
页码:3497 / 3508
页数:12
相关论文
共 77 条
[1]  
Ainsworth W., 1975, Auditory analysis and perception of speech, P103, DOI DOI 10.1016/B978-0-12-248550-3.50011-8
[2]  
[Anonymous], [No title captured]
[3]   VOWEL IDENTIFICATION - ORTHOGRAPHIC, PERCEPTUAL, AND ACOUSTIC ASPECTS [J].
ASSMANN, PF ;
NEAREY, TM ;
HOGAN, JT .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1982, 71 (04) :975-989
[4]   MODELING THE PERCEPTION OF CONCURRENT VOWELS - VOWELS WITH THE SAME FUNDAMENTAL-FREQUENCY [J].
ASSMANN, PF ;
SUMMERFIELD, Q .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1989, 85 (01) :327-338
[5]   PERCEPTION OF FRONT VOWELS - THE ROLE OF HARMONICS IN THE 1ST FORMANT REGION [J].
ASSMANN, PF ;
NEAREY, TM .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1987, 81 (02) :520-534
[6]   THE PERCEPTION OF BACK VOWELS - CENTER OF GRAVITY HYPOTHESIS [J].
ASSMANN, PF .
QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY SECTION A-HUMAN EXPERIMENTAL PSYCHOLOGY, 1991, 43 (03) :423-448
[7]   THE INFLUENCE OF SPECTRAL PROMINENCE ON PERCEIVED VOWEL QUALITY [J].
BEDDOR, PS ;
HAWKINS, S .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1990, 87 (06) :2684-2704
[8]   REDUCTION OF SPEECH SPECTRA BY ANALYSIS-BY-SYNTHESIS TECHNIQUES [J].
BELL, CG ;
STEVENS, KN ;
HOUSE, AS ;
FUJISAKI, H ;
HEINZ, JM .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1961, 33 (12) :1725-&
[9]   THE EFFECT OF PITCH-RELATED CHANGES ON THE PERCEPTION OF SUNG VOWELS [J].
BENOLKEN, MS ;
SWANSON, CE .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1990, 87 (04) :1781-1785
[10]  
BLADON A, 1982, REPRESENTATION SPEEC, P95