Blind separation of speech mixtures via time-frequency masking

被引:1085
作者
Yilmaz, Ö
Rickard, S
机构
[1] Univ Maryland, Dept Math, College Pk, MD 20742 USA
[2] Univ Coll Dublin, Dept Elect & Elect Engn, Dublin 2, Ireland
关键词
D O I
10.1109/TSP.2004.828896
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Binary time-frequency masks ate powerful tools for the separation of sources from a single mixture. Perfect demixing via binary time-frequency masks is possible provided the,time-frequency representations of the sources do not overlap: a condition we call W-disjoint orthogonality. We introduce here the concept of approximate W-disjoint orthogonality and present experimental results demonstrating the level of approximate W-disjoint orthogonality of speech in mixtures of various orders. The results demonstrate that there exist ideal binary time-frequency masks that can separate several speech signals from one mixture. While determining these masks blindly from just one mixture is an open problem, we show that we can approximate the,ideal masks in the case where two anechoic mixtures are provided. Motivated by the maximum likelihood mixing parameter estimators, we define a power. weighted two-dimensional (2-D) histogram constructed from the ratio of the time-frequency representations of the mixtures that is shown to have one peak for each source with peak location corresponding to the relative attenuation and delay mixing parameters. The histogram is used to create time-frequency masks that partition one of the mixtures into the original sources. Experimental results on speech mixtures verify the technique. Example demixing results can be found online at http://alum.mit.edu/www/rickard/bss.html.
引用
收藏
页码:1830 / 1847
页数:18
相关论文
共 21 条
[1]  
[Anonymous], 1993, Ten Lectures of Wavelets
[2]  
[Anonymous], 2001, P IEEE INT C IND COM
[3]  
Aoki M., 2001, Acoustical Science and Technology, V22, P149, DOI 10.1250/ast.22.149
[4]  
BALAN R, 2000, P INT WORKSH IND COM, P429
[5]  
BALAN R, 2000, P C INF SCI SYST PRI, V1
[6]   Speakers' direction finding using estimated time delays in the frequency domain [J].
Berdugo, B ;
Rosenhouse, J ;
Azhari, H .
SIGNAL PROCESSING, 2002, 82 (01) :19-30
[7]   Underdetermined blind separation of delayed sound sources in the frequency domain [J].
Bofill, P .
NEUROCOMPUTING, 2003, 55 (3-4) :627-641
[8]  
BOFILL P, 2000, P ICA2000, P87
[9]   A BIOMIMETIC SYSTEM FOR LOCALIZATION AND SEPARATION OF MULTIPLE SOUND SOURCES [J].
HUANG, J ;
OHNISHI, N ;
SUGIE, N .
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 1995, 44 (03) :733-738
[10]  
Jourjine A, 2000, INT CONF ACOUST SPEE, P2985, DOI 10.1109/ICASSP.2000.861162