DYNAMIC SPECTRAL SHAPE-FEATURES AS ACOUSTIC CORRELATES FOR INITIAL STOP CONSONANTS

被引:51
作者
NOSSAIR, ZB
ZAHORIAN, SA
机构
[1] Department of Electrical and Computer Engineering, Old Dominion University, Norfolk
关键词
D O I
10.1121/1.400735
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A comprehensive investigation of two acoustic feature sets for English stop consonants spoken in syllable initial position was conducted to determine the relative invariance of the features that cue place and voicing. The features evaluated were overall spectral shape, encoded as the cosine transform coefficients of the nonlinearly scaled amplitude spectrum, and formants. In addition, features were computed both for the static case, i.e., from one 25-ms frame starting at the burst, and for the dynamic case, i.e., as parameter trajectories over several frames of speech data. All features were evaluated with speaker-independent automatic classification experiments using the data from 15 speakers to train the classifier and the data from 15 different speakers for testing. The primary conclusions from these experiments, as measured via automatic recognition rates, are as follows: (1) spectral shape features are superior to both formants, and formants plus amplitudes; (2) features extracted from the dynamic spectrum are superior to features extracted from the static spectrum; and (3) features extracted from the speech signal beginning with the burst onset are superior to features extracted from the speech signal beginning with the vowel transition. Dynamic features extracted from the smoothed spectra over a 60-ms interval timed to begin with the burst onset appear to account for the primary vowel context effects. Automatic recognition results for the 6 stops (93.7%) based on 20 features was better than the rates obtained with human listeners for a 50-ms segment (89.9%) and only slightly worse than the rates obtained by human listeners for a 100-ms interval (96.6%). Thus the basic conclusion from our work is that dynamic spectral shape features are acoustically invariant cues for both place and voicing in initial stop consonants.
引用
收藏
页码:2978 / 2991
页数:14
相关论文
共 34 条
[1]   ACOUSTIC INVARIANCE IN SPEECH PRODUCTION - EVIDENCE FROM MEASUREMENTS OF THE SPECTRAL CHARACTERISTICS OF STOP CONSONANTS [J].
BLUMSTEIN, SE ;
STEVENS, KN .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 66 (04) :1001-1017
[2]  
BLUMSTEIN SE, 1980, J ACOUST SOC AM, V67, P68
[3]   FEATURE SELECTION VIA DYNAMIC-PROGRAMMING FOR TEXT-INDEPENDENT SPEAKER IDENTIFICATION [J].
CHEUNG, RS ;
EISENSTEIN, BA .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1978, 26 (05) :397-403
[4]   TOWARD A THEORY OF SPEECH-PERCEPTION [J].
COLE, RA ;
SCOTT, B .
PSYCHOLOGICAL REVIEW, 1974, 81 (04) :348-374
[5]   PHANTOM IN PHONEME - INVARIANT CUES FOR STOP CONSONANTS [J].
COLE, RA ;
SCOTT, B .
PERCEPTION & PSYCHOPHYSICS, 1974, 15 (01) :101-107
[6]   ACOUSTIC LOCI AND TRANSITIONAL CUES FOR CONSONANTS [J].
DELATTRE, PC ;
LIBERMAN, AM ;
COOPER, FS .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1955, 27 (04) :769-773
[7]   STOP-CONSONANT RECOGNITION - RELEASE BURSTS AND FORMANT TRANSITIONS AS FUNCTIONALLY EQUIVALENT, CONTEXT-DEPENDENT CUES [J].
DORMAN, MF ;
STUDDERTKENNEDY, M ;
RAPHAEL, LJ .
PERCEPTION & PSYCHOPHYSICS, 1977, 22 (02) :109-122
[8]  
Duda R. O., 1973, PATTERN CLASSIFICATI
[9]  
FISCHERJORGENSE.E, 1954, MISC PHONET, V2, P42
[10]   ON THE ROLE OF SPECTRAL TRANSITION FOR SPEECH-PERCEPTION [J].
FURUI, S .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1986, 80 (04) :1016-1025