Blind Spatial Subtraction Array for Speech Enhancement in Noisy Environment

被引:85
作者
Takahashi, Yu [1 ]
Takatani, Tomoya [1 ]
Osako, Keiichi [1 ]
Saruwatari, Hiroshi [1 ]
Shikano, Kiyohiro [1 ]
机构
[1] Nara Inst Sci & Technol, Grad Sch Informat Sci, Nara 6300192, Japan
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2009年 / 17卷 / 04期
关键词
Blind source separation (BSS); independent component analysis (ICA); microphone array; speech enhancement; INDEPENDENT COMPONENT ANALYSIS; SEPARATION; ALGORITHM;
D O I
10.1109/TASL.2008.2011517
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We propose a new blind spatial subtraction array (BSSA) consisting of a noise estimator based on independent component analysis (ICA) for efficient speech enhancement. In this paper, first, we theoretically and experimentally point out that ICA is proficient in noise estimation under a non-point-source noise condition rather than in speech estimation. Therefore, we propose BSSA that utilizes ICA as a noise estimator. In BSSA, speech extraction is achieved by subtracting the power spectrum of noise signals estimated using ICA from the power spectrum of the partly enhanced target speech signal with a delay-and-sum beamformer. This "power-spectrum-domain subtraction" procedure enables better noise reduction than the conventional ICA with estimation-error robustness. Another benefit of BSSA architecture is "permutation robustness." Although the ICA part in BSSA suffers from a source permutation problem, the BSSA architecture can reduce the negative affection when permutation arises. The results of various speech enhancement test reveal that the noise reduction and speech recognition performance of the proposed BSSA are superior to those of conventional methods.
引用
收藏
页码:650 / 664
页数:15
相关论文
共 31 条
[21]   Convolutive blind separation of non-stationary sources [J].
Parra, L ;
Spence, C .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (03) :320-327
[22]  
Pham D. - T., 2003, INT S IND COMP AN BL, P975
[23]  
Rabiner L., 1993, Fundamentals of speech recognition
[24]   Online maximization of subband kurtosis for blind adaptive beamforming in realtime speech extraction [J].
Saellberg, Benny ;
Grbic, Nedelko ;
Claesson, Ingvar .
PROCEEDINGS OF THE 2007 15TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING, 2007, :603-+
[25]   Blind source separation based on a fast-convergence algorithm combining ICA and beamforming [J].
Saruwatari, H ;
Kawamura, T ;
Nishikawa, T ;
Lee, A ;
Shikano, K .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (02) :666-678
[26]   Blind source separation combining independent component analysis and beamforming [J].
Saruwatari, H ;
Kurita, S ;
Takeda, K ;
Itakura, F ;
Nishikawa, T ;
Shikano, K .
EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2003, 2003 (11) :1135-1146
[27]   A robust and precise method for solving the permutation problem of frequency-domain blind source separation [J].
Sawada, H ;
Mukai, R ;
Araki, S ;
Makino, S .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2004, 12 (05) :530-538
[28]   Permutation correction in the frequency domain in blind separation of speech mixtures [J].
Serviere, Ch. ;
Pham, D. T. .
EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2006, 2006 (1)
[29]   Visualizing the performance of large-aperture microphone arrays [J].
Silverman, HF ;
Patterson, WR .
ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, :969-972
[30]   Blind separation of convolved mixtures in the frequency domain [J].
Smaragdis, P .
NEUROCOMPUTING, 1998, 22 (1-3) :21-34