ROBUST SPEECH RECOGNITION TRAINING VIA DURATION AND SPECTRAL-BASED STRESS TOKEN GENERATION

被引:18
作者
HANSEN, JHL
BOUGHAZALE, SE
机构
[1] Robust Speech Processing Laboratory, Department of Electrical Engineering, Duke University, Durham, North Carolina 27708-
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 1995年 / 3卷 / 05期
基金
美国国家科学基金会;
关键词
D O I
10.1109/89.466654
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
It is known that speech recognition performance degrades if systems are not trained and tested under similar speaking conditions. This is particularly true if a speaker is exposed to demanding workload stress or noise. For current recognition systems to be successful in applications susceptible to stress, speech recognizers should address the adverse conditions experienced by the user. In this study, we consider the problem of improved recognition training for speech recognition for various stressed speaking conditions (e.g., slow, loud, and Lombard effect speaking styles). The main objective is to devise a training procedure that produces a hidden Markov model recognizer that better characterizes a given stressed speaking style, without the need for directly collecting such stressed data. The novel approach is to construct a word production model using a previously suggested source generator framework [6], by employing knowledge of the statistical nature of duration and spectral variation of speech under stress. This model is used in turn to produce simulated stressed speech training tokens from neutral speech tokens. The token generation training method is shown to improve isolated word recognition by 24% for Lombard speech when compared to a neutral trained isolated word recognizer [1]. Further results are reported for isolated and keyword recognition scenarios.
引用
收藏
页码:415 / 421
页数:7
相关论文
共 13 条
[1]  
Bou-Ghazale S.E., Duration and spectral based stress token generation for keyword recognition using hidden Markov models, M.S. thesis, (1993)
[2]  
Chen Y., Cepstral domain stress compensation for robust speech recognition, Proc. IEEE Int. Conf. Acoust., pp. 717-720, (1987)
[3]  
Clary G.J., Hansen J.H.L., A novel speech recognizer for keyword spotting, ICSLP-92: Proc. Int. Conf. Spoken Language Proc. Alberta, 1, pp. 13-16, (1992)
[4]  
Crystal T.H., House A.S., Characterization and modeling of speech-segment, Proc. IEEE Int. Conf. Acoust., pp. 2791-2794, (1986)
[5]  
Hansen J.H.L., Analysis and compensation of stressed and noisy speech with application to robust automatic recognition, Ph.D. thesis, (1988)
[6]  
Morphological constrained enhancement with adaptive cepstral compensation (MCE-ACC) for speech recognition in noise and Lombard effect, IEEE Trans. Speech Audio Processing, 2, 4, pp. 598-614, (1994)
[7]  
Hansen J.H.L., Bria O., Improved automatic speech recognition in noise and Lombard effect, Proc. EURASIP-92, pp. 403-406, (1992)
[8]  
Lombard effect compensation for robust automatic speech recognition in noise, ICSLP-90: Proc. Int. Conf. Spoken Language Processing, pp. 1125-1128, (1990)
[9]  
Hansen J.H.L., Clements M.A., Stress compensation and noise reduction algorithms for robust speech recognition, Proc. IEEE Int. Conf. Acoust., pp. 266-269, (1989)
[10]  
Junqua J., The Lombard reflex and its role on human listeners and automatic speech recognizers, J. Acoust. Soc. Amer., 93, 1, pp. 510-523, (1993)