Analysis and recognition of whispered speech

被引：121

作者：

Ito, T ^{[1
]}

Takeda, K ^{[1
]}

Itakura, F ^{[1
]}

机构：

[1] Nagoya Univ, Grad Sch Engn, Nagoya, Aichi 4648603, Japan

来源：

SPEECH COMMUNICATION | 2005年 / 45卷 / 02期

关键词：

speech recognition; whispered speech; telephone handset; noise robustness;

D O I：

10.1016/j.specom.2003.10.005

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this study, we have examined the acoustic characteristics of whispered speech and addressed some of the issues involved in recognition of whispered speech used for communication over a mobile phone in a noisy environment. The acoustic analysis shows that there is an upward shift of formant frequencies of vowels as observed in the whispered speech data compared to the normal speech data. Voiced consonants in the whispered speech have lower energy at low frequencies up to 1.5 kHz and their spectral flatness is greater compared to the normal speech. In experiments on whispered speech recognition, results of our studies on adaptation of the whispered speech models have shown that adaptation using a small amount of whispered speech data from a target speaker can be effectively used for recognition of the whispered speech. In a noisy environment, the recognition accuracy decreases significantly for the whispered speech compared to the normal speaking of the same speech. A method to increase the SNR by covering the mouth with a hand has been shown to give a higher recognition accuracy for the whispered speech frequently encountered for private communication in a noisy environment. (C) 2004 Elsevier B.V. All rights reserved.

引用

页码：139 / 152

页数：14

共 14 条

[1] Comparative study of male and female whispered and phonated versions of the long vowels of Swedish [J].

Eklund, I ;

Traunmuller, H .

PHONETICA, 1997, 54 (01) :1-21

[2] SWEEP-TONE MEASUREMENTS OF VOCAL-TRACT CHARACTERISTICS [J].

FUJIMURA, O ;

LINDQVIST, J .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1971, 49 (02) :541-+

[3]

Holmes J.N., 1983, J ACOUST SOC AM, V73, pS87

[4]

ITOU K, 1998, P OR COCOSDA MAY 199

[5] FORMANT-FREQUENCY DIFFERENCES BETWEEN ISOLATED WHISPERED AND PHONATED VOWEL SAMPLES PRODUCED BY ADULT FEMALE SUBJECTS [J].

KALLAIL, KJ ;

EMANUEL, FW .

JOURNAL OF SPEECH AND HEARING RESEARCH, 1984, 27 (02) :245-251

[6]

KAWAHARA T, 1999, P IEEE AUT SPEECH RE, P393

[7]

Konno H., 1996, SP95140 IEICE, P39

[8] ATR JAPANESE SPEECH DATABASE AS A TOOL OF SPEECH RECOGNITION AND SYNTHESIS [J].

KUREMATSU, A ;

TAKEDA, K ;

SAGISAKA, Y ;

KATAGIRI, S ;

KUWABARA, H ;

SHIKANO, K .

SPEECH COMMUNICATION, 1990, 9 (04) :357-363

[9]

Leggetter C. J., 1995, P ARPA SPOK LANG TEC

[10] REALIZATION OF PROSODIC FEATURES IN WHISPERED SPEECH [J].

MEYEREPPLER, W .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1957, 29 (01) :104-106

← 1 2 →