Impact of vocal effort variability on automatic speech recognition

被引:58
作者
Zelinka, Petr [1 ]
Sigmund, Milan [1 ]
Schimmel, Jiri [2 ]
机构
[1] Brno Univ Technol, Dept Radio Elect, Brno 61200, Czech Republic
[2] Brno Univ Technol, Dept Telecommun, Brno 61200, Czech Republic
关键词
Vocal effort level; Robust speech recognition; Machine learning;
D O I
10.1016/j.specom.2012.01.002
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The impact of changes in a speaker's vocal effort on the performance of automatic speech recognition has largely been overlooked by researchers and virtually no speech resources exist for the development and testing of speech recognizers at all vocal effort levels. This study deals with speech properties in the whole range of vocal modes - whispering, soft speech, normal speech, loud speech, and shouting. Fundamental acoustic and phonetic changes are documented. The impact of vocal effort variability on the performance of an isolated-word recognizer is shown and effective means of improving the system's robustness are tested. The proposed multiple model framework approach reaches a 50% relative reduction of word error rate compared to the baseline system. A new specialized speech database, BUT-VE1, is presented, which contains speech recordings of 13 speakers at 5 vocal effort levels with manual phonetic segmentation and sound pressure level calibration. (C) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:732 / 742
页数:11
相关论文
共 28 条
[1]  
[Anonymous], P ASRU 01
[2]  
[Anonymous], P INT 2009
[3]  
[Anonymous], 2000, INTERSPEECH, DOI DOI 10.1016/S0167-6393(03)00016-5
[4]  
[Anonymous], AUTOMATIC SPEECH SPE
[5]  
[Anonymous], MULTIPLE MODEL APPRO
[6]  
[Anonymous], DISCRETE TIME SPEECH
[7]  
[Anonymous], P ITRW 2008
[8]  
[Anonymous], P 7 EUROPEAN C SPEEC
[9]  
[Anonymous], P 2007 IEEE AER C
[10]  
[Anonymous], P INT 2008