DYNAMIC SPECTRAL SHAPE-FEATURES AS ACOUSTIC CORRELATES FOR INITIAL STOP CONSONANTS

被引：51

作者：

NOSSAIR, ZB

ZAHORIAN, SA

机构：

[1] Department of Electrical and Computer Engineering, Old Dominion University, Norfolk

来源：

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA | 1991年 / 89卷 / 06期

关键词：

D O I：

10.1121/1.400735

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

A comprehensive investigation of two acoustic feature sets for English stop consonants spoken in syllable initial position was conducted to determine the relative invariance of the features that cue place and voicing. The features evaluated were overall spectral shape, encoded as the cosine transform coefficients of the nonlinearly scaled amplitude spectrum, and formants. In addition, features were computed both for the static case, i.e., from one 25-ms frame starting at the burst, and for the dynamic case, i.e., as parameter trajectories over several frames of speech data. All features were evaluated with speaker-independent automatic classification experiments using the data from 15 speakers to train the classifier and the data from 15 different speakers for testing. The primary conclusions from these experiments, as measured via automatic recognition rates, are as follows: (1) spectral shape features are superior to both formants, and formants plus amplitudes; (2) features extracted from the dynamic spectrum are superior to features extracted from the static spectrum; and (3) features extracted from the speech signal beginning with the burst onset are superior to features extracted from the speech signal beginning with the vowel transition. Dynamic features extracted from the smoothed spectra over a 60-ms interval timed to begin with the burst onset appear to account for the primary vowel context effects. Automatic recognition results for the 6 stops (93.7%) based on 20 features was better than the rates obtained with human listeners for a 50-ms segment (89.9%) and only slightly worse than the rates obtained by human listeners for a 100-ms interval (96.6%). Thus the basic conclusion from our work is that dynamic spectral shape features are acoustically invariant cues for both place and voicing in initial stop consonants.

引用

页码：2978 / 2991

页数：14

共 34 条

[1] ACOUSTIC INVARIANCE IN SPEECH PRODUCTION - EVIDENCE FROM MEASUREMENTS OF THE SPECTRAL CHARACTERISTICS OF STOP CONSONANTS [J].