Permutation inconsistency in blind speech separation: Investigation and solutions

被引:60
作者
Ikram, MZ [1 ]
Morgan, DR
机构
[1] Texas Instruments Inc, DSP Solut R&D Ctr, Dallas, TX 75243 USA
[2] Lucent Technol Inc, Bell Labs, Murray Hill, NJ 07974 USA
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 2005年 / 13卷 / 01期
关键词
beamforming; blind separation; permutation inconsistency; room acoustics; speech enhancement; speech signals;
D O I
10.1109/TSA.2004.834441
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Acoustic reverberation severely limits the performance of multiple microphone blind speech separation (BSS) methods. In this paper, we show that the limited performance is due to random permutations of the unmixing filters over frequency. This problem, which we refer to as permutation inconsistency, becomes worse as the length of the room impulse response increases. We explore interesting connections between BSS and ideal beamforming, which leads us to propose a. permutation alignment scheme based on microphone array directivity patterns. Given that the permutations are properly aligned, we show that the blind speech separation method outperforms the nonblind beamformer in a highly reverberant environment. Furthermore, we discover the tradeoff where permutations can be aligned by affording a loss in spectral resolution of the unmixing filters. We then propose a multistage algorithm, which aligns the unmixing filter permutations without sacrificing the spectral resolution. For our study, we perform experiments in both real and simulated environments and compare the results to the ideal performance benchmarks that we derive using prior knowledge of the mixing filters.
引用
收藏
页码:1 / 13
页数:13
相关论文
共 22 条
[1]   IMAGE METHOD FOR EFFICIENTLY SIMULATING SMALL-ROOM ACOUSTICS [J].
ALLEN, JB ;
BERKLEY, DA .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 65 (04) :943-950
[2]  
*AR CORP, 1992, US MAN SYSID AUD BAN
[3]   Adaptive eigenvalue decomposition algorithm for passive acoustic source localization [J].
Benesty, J .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2000, 107 (01) :384-391
[4]   HUMANET - AN EXPERIMENTAL HUMAN-MACHINE COMMUNICATIONS NETWORK BASED ON ISDN WIDE-BAND AUDIO [J].
BERKLEY, DA ;
FLANAGAN, JL .
AT&T TECHNICAL JOURNAL, 1990, 69 (05) :87-99
[5]   Jacobi angles for simultaneous diagonalization [J].
Cardoso, JF ;
Souloumiac, A .
SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 1996, 17 (01) :161-164
[6]   Blind signal separation: Statistical principles [J].
Cardoso, JF .
PROCEEDINGS OF THE IEEE, 1998, 86 (10) :2009-2025
[7]   Blind separation of convolutive mixtures and an application in automatic speech recognition in a noisy environment [J].
Ehlers, F ;
Schuster, HG .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1997, 45 (10) :2608-2612
[8]  
Golub GH, 1965, SIAM J NUMER ANAL, V2, P205, DOI DOI 10.1137/0702016
[9]   Real-time passive source localization: A practical linear-correction least-squares approach [J].
Huang, YT ;
Benesty, J ;
Elko, GW ;
Mersereau, RM .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (08) :943-956
[10]  
Ikram MZ, 2002, INT CONF ACOUST SPEE, P881