A comparative study of traditional and newly proposed features for recognition of speech under stress

被引：158

作者：

Bou-Ghazale, SE ^{[1
]}

Hansen, JHL ^{[1
]}

机构：

[1] Univ Colorado, Ctr Spoken Language Res, Robust Speech Proc Lab, Boulder, CO 80309 USA

来源：

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 2000年 / 8卷 / 04期

关键词：

linear prediction; Lombard effect; speech recognition; speech under stress;

D O I：

10.1109/89.848224

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

It is well known that the performance of speech recognition algorithms degrade in the presence of adverse environments where a speaker is under stress, emotion, or Lombard effect. This study evaluates the effectiveness of traditional features in recognition of speech under stress and formulates new features which are shown to improve stressed speech recognition. The focus is on formulating robust features which are less dependent on the speaking conditions rather than applying compensation or adaptation techniques. The stressed speaking styles considered are simulated angry and loud, Lombard effect speech, and noisy actual stressed speech from the SUSAS database which is available on CD-ROM through the NATO IST/TG-01 research group and LDC1. In addition, this study investigates the immunity of linear prediction power spectrum and fast Fourier transform power spectrum to the presence of stress. Our results show that unlike fast Fourier transform's (FFT) immunity to noise, the linear prediction power spectrum is more immune than FFT to stress as well as to a combination of a noisy and stressful environment. Finally, the effect of various parameter processing such as fixed versus variable preemphasis. liftering, and fixed versus cepstral mean normalization are studied. Two alternative frequency partitioning methods are proposed and compared with traditional mel-frequency cepstral coefficients (MFCC) features for stressed speech recognition. It is shown that the alternate filterbank frequency partitions are more effective for recognition of speech under both simulated and actual stressed conditions.

引用

页码：429 / 442

页数：14

共 38 条

[1]

[Anonymous], 1988, THESIS GEORGIA I TEC

[2] Study of temporal features and frequency characteristics in American English foreign accent [J].

Arslan, LM ;

Hansen, JHL .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1997, 102 (01) :28-40

[3]

ASSALEH K, 1994, P IEEE INT C AC SPEE, P664

[4]

Chen Y., 1987, Proceedings: ICASSP 87. 1987 International Conference on Acoustics, Speech, and Signal Processing (Cat. No.87CH2396-0), P717

[5] COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].

DAVIS, SB ;

MERMELSTEIN, P .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366

[6]

Deller J., 2000, Discrete-Time Processing of Speech Signals

[7]

GONG Y, 1995, SPEECH COMMUN, P261

[8]

Hansen J., 1990, INT C SPOK LANG PROC, P1125

[9] ROBUST SPEECH RECOGNITION TRAINING VIA DURATION AND SPECTRAL-BASED STRESS TOKEN GENERATION [J].

HANSEN, JHL ;

BOUGHAZALE, SE .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1995, 3 (05) :415-421

[10] Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition [J].

Hansen, JHL .

SPEECH COMMUNICATION, 1996, 20 (1-2) :151-173

← 1 2 3 4 →