Grouping separated frequency components by estimating propagation model parameters in frequency-domain blind source separation

被引:91
作者
Sawada, Hiroshi [1 ]
Araki, Shoko [1 ]
Mukai, Ryo [1 ]
Makino, Shoji [1 ]
机构
[1] NTT Corp, Commun Sci Labs, NTT, Kyoto 6190237, Japan
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2007年 / 15卷 / 05期
关键词
blind source separation (BSS); convolutive mixture; frequency domain; generalized cross correlation; independent component analysis (ICA); permutation problem; sparseness; time delay estimation; time-frequency (T-F) masking;
D O I
10.1109/TASL.2007.899218
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper proposes a new formulation and optimization procedure for grouping frequency components in frequency-domain blind source separation (BSS). We adopt two separation techniques, independent component analysis (ICA) and time-frequency.(T-F) masking, for the frequency-domain BSS. With ICA, grouping the frequency components corresponds to aligning the permutation ambiguity of the ICA solution in each frequency bin. With T-F masking, grouping the frequency components corresponds to classifying sensor observations in the time-frequency domain for individual sources. The grouping procedure is based on estimating anechoic propagation model parameters by analyzing ICA results or sensor observations. More specifically, the time delays of arrival and attenuations from a source to all sensors are estimated for each source. The focus of this paper includes the applicability of the proposed procedure for a situation with wide sensor spacing where spatial aliasing may occur. Experimental results show that the proposed procedure effectively separates two or three sources with several sensor configurations in a real room, as long as the room reverberation is moderately low.
引用
收藏
页码:1592 / 1604
页数:13
相关论文
共 37 条
[1]  
ANEMULLER J, 2000, P 2 INT WORKSH IND C, P215
[2]  
[Anonymous], 2002, Adaptive Blind Signal and Image Processing: Learning Algorithms and Applications
[3]  
[Anonymous], 2001, P IEEE INT C IND COM
[4]  
Aoki M., 2001, Acoustical Science and Technology, V22, P149, DOI 10.1250/ast.22.149
[5]  
ARAKI S, 2005, P INT WORKSH AC ECH, P117
[6]   Underdetermined blind separation of delayed sound sources in the frequency domain [J].
Bofill, P .
NEUROCOMPUTING, 2003, 55 (3-4) :627-641
[7]   A closed-form location estimator for use with room environment microphone arrays [J].
Brandstein, MS ;
Adcock, JE ;
Silverman, HF .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1997, 5 (01) :45-50
[8]  
Chen JD, 2004, AUDIO SIGNAL PROCESSING: FOR NEXT-GENERATION MULTIMEDIA COMMUNICATION SYSTEMS, P197, DOI 10.1007/1-4020-7769-6_8
[9]   Precise dereverberation using multichannel linear prediction [J].
Delcroix, Marc ;
Hikichi, Takafumi ;
Miyoshi, Masato .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (02) :430-440
[10]  
Duda R. O., 1973, Pattern Classification