A spatio-temporal speech enhancement scheme for robust speech recognition in noisy environments

被引:27
作者
Visser, E
Otsuka, M
Lee, TW
机构
[1] Univ Calif San Diego, Inst Neural Computat, Dept 0523, La Jolla, CA 92093 USA
[2] DENSO Corp, Res Labs, Aichi 4700111, Japan
关键词
speech enhancement; robust speech recognition; blind source separation; noisy environments;
D O I
10.1016/S0167-6393(03)00010-4
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A new speech enhancement scheme is presented integrating spatial and temporal signal processing methods for robust speech recognition in noisy environments. The scheme first separates spatially localized point sources from noisy speech signals recorded by two microphones. Blind source separation algorithms assuming no a priori knowledge about the sources involved are applied in this spatial processing stage. Then denoising of distributed background noise is achieved in a combined spatial/temporal processing approach. The desired speaker signal is first processed along with an artificially constructed noise signal in a supplementary blind source separation step. It is further denoised by exploiting differences in temporal speech and noise statistics in a wavelet filterbank. The scheme's performance is illustrated by speech recognition experiments on real recordings in a noisy car environment. In comparison to a common multi-microphone technique like beamforming with spectral subtraction, the scheme is shown to enable more accurate speech recognition in the presence of a highly interfering point source and strong background noise. (C) 2003 Elsevier B.V. All rights reserved.
引用
收藏
页码:393 / 407
页数:15
相关论文
共 42 条
[21]  
FRIEDLANDER B, 1984, IEEE T AEROSPACE ELE, V1
[22]   RASTA Processing of Speech [J].
Hermansky, Hynek ;
Morgan, Nelson .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (04) :578-589
[23]  
HIRSCH HG, 2000, ISCA ITRW ASR2000 CH
[24]  
Holschneider M., 1989, WAVELETS TIME FREQUE, P289, DOI 10.1007/978-3-642-75988-8_28
[25]   A fast fixed-point algorithm for independent component analysis [J].
Hyvarinen, A ;
Oja, E .
NEURAL COMPUTATION, 1997, 9 (07) :1483-1492
[26]  
Johnson, 1993, ARRAY SIGNAL PROCESS
[27]   Speech recognition in noisy environments using first-order vector Taylor series [J].
Kim, DY ;
Un, CK ;
Kim, NS .
SPEECH COMMUNICATION, 1998, 24 (01) :39-49
[28]   GENERALIZED CORRELATION METHOD FOR ESTIMATION OF TIME-DELAY [J].
KNAPP, CH ;
CARTER, GC .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1976, 24 (04) :320-327
[29]  
Lee TW, 1997, ADV NEUR IN, V9, P758
[30]  
Lee TW, 1998, INT CONF ACOUST SPEE, P1249, DOI 10.1109/ICASSP.1998.675498