A robust and precise method for solving the permutation problem of frequency-domain blind source separation

被引:351
作者
Sawada, H [1 ]
Mukai, R [1 ]
Araki, S [1 ]
Makino, S [1 ]
机构
[1] NTT Corp, NTT Commun Sci Labs, Kyoto 6190237, Japan
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 2004年 / 12卷 / 05期
关键词
blind source separation (BSS); convolutive mixture; direction of arrival (DOA) estimation; frequency domain; independent component analysis (ICA); permutation problem; signal envelope;
D O I
10.1109/TSA.2004.832994
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Blind source separation (BSS) for convolutive mixtures can be solved efficiently in the frequency domain, where independent component analysis (ICA) is performed separately in each frequency bin. However, frequency-domain BSS involves a permutation problem: the permutation ambiguity of ICA in each frequency bin should be aligned so that a separated signal in the time-domain contains frequency components of the same source signal. This paper presents a robust and precise method for solving the permutation problem. It is based on two approaches: direction of arrival (DOA) estimation for sources and the interfrequency correlation of signal envelopes. We discuss the advantages and disadvantages of the two approaches, and integrate them to exploit their respective advantages. Furthermore, by utilizing the harmonics of signals, we make the new method robust even for low frequencies where DOA estimation is inaccurate. We also present a new closed-form formula for estimating DOAs from a separation matrix obtained by ICA. Experimental results show that our method provided an almost perfect solution to the permutation problem for a case where two sources were mixed in a room whose reverberation time was 300 ms.
引用
收藏
页码:530 / 538
页数:9
相关论文
共 33 条
[1]   Natural gradient works efficiently in learning [J].
Amari, S .
NEURAL COMPUTATION, 1998, 10 (02) :251-276
[2]   Multichannel blind deconvolution and equalization using the natural gradient [J].
Amari, S ;
Douglas, SC ;
Cichocki, A ;
Yang, HH .
FIRST IEEE SIGNAL PROCESSING WORKSHOP ON SIGNAL PROCESSING ADVANCES IN WIRELESS COMMUNICATIONS, 1997, :101-104
[3]  
ANEMULLER J, 2000, P 2 INT WORKSH IND C, P215
[4]   Equivalence between frequency-domain blind source separation and frequency-domain adaptive beamforming for convolutive mixtures [J].
Araki, S ;
Makino, S ;
Hinamoto, Y ;
Mukai, R ;
Nishikawa, T ;
Saruwatari, H .
EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2003, 2003 (11) :1157-1166
[5]  
Asano F, 2001, INT CONF ACOUST SPEE, P2729, DOI 10.1109/ICASSP.2001.940210
[6]  
BACK AD, 1994, P NEUR NETW SIGN PRO, P565
[7]   AN INFORMATION MAXIMIZATION APPROACH TO BLIND SEPARATION AND BLIND DECONVOLUTION [J].
BELL, AJ ;
SEJNOWSKI, TJ .
NEURAL COMPUTATION, 1995, 7 (06) :1129-1159
[8]  
Buchner H., 2003, P INT S IND COMP AN, P945
[9]   BLIND BEAMFORMING FOR NON-GAUSSIAN SIGNALS [J].
CARDOSO, JF ;
SOULOUMIAC, A .
IEE PROCEEDINGS-F RADAR AND SIGNAL PROCESSING, 1993, 140 (06) :362-370
[10]   Convolutive blind separation of speech mixtures using the natural gradient [J].
Douglas, SC ;
Sun, XA .
SPEECH COMMUNICATION, 2003, 39 (1-2) :65-78