Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation

被引：512

作者：

Huang, Po-Sen ^{[1
,2
]}

Kim, Minje ^{[3
]}

Hasegawa-Johnson, Mark ^{[1
]}

Smaragdis, Paris ^{[1
,3
,4
]}

机构：

[1] Univ Illinois, Dept Elect & Comp Engn, Urbana, IL 61801 USA

[2] Clarifai, New York, NY 10010 USA

[3] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA

[4] Adobe Res, San Francisco, CA 94103 USA

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2015年 / 23卷 / 12期

基金：

美国国家科学基金会;

关键词：

Deep recurrent neural network (DRNN); discriminative training; monaural source separation; time-frequency masking; SINGING-VOICE SEPARATION; SPEECH;

D O I：

10.1109/TASLP.2015.2468583

中图分类号：

O42 [声学];

学科分类号：

070206 [声学];

摘要：

Monaural source separation is important for many real world applications. It is challenging because, with only a single channel of information available, without any constraints, an infinite number of solutions are possible. In this paper, we explore joint optimization of masking functions and deep recurrent neural networks for monaural source separation tasks, including speech separation, singing voice separation, and speech denoising. The joint optimization of the deep recurrent neural networks with an extra masking layer enforces a reconstruction constraint. Moreover, we explore a discriminative criterion for training neural networks to further enhance the separation performance. We evaluate the proposed system on the TSP, MIR-1K, and TIMIT datasets for speech separation, singing voice separation, and speech denoising tasks, respectively. Our approaches achieve 2.30-4.98 dB SDR gain compared to NMF models in the speech separation task, 2.30-2.48 dB GNSDR gain and 4.32-5.42 dB GSIR gain compared to existing models in the singing voice separation task, and outperform NMF and DNN baselines in the speech denoising task.

引用

页码：2136 / 2147

页数：12

共 40 条

[1]

[Anonymous], P ADV MOD AC PROC NE

[2]

[Anonymous], 2014, INT C LEARN REPR

[3]

[Anonymous], BACKPROPAGATION THEO

[4]

[Anonymous], P 13 INT SOC MUS INF

[5]

[Anonymous], 1989, Complex Syst.

[6]

[Anonymous], 2013, ISMIR

[7]

[Anonymous], 2014, P INT SOC MUS INF RE

[8]

[Anonymous], P 14 INT C ART INT S

[9]

SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION [J].

BOLL, SF .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02) :113-120

[10]

Bruna J, 2015, INT CONF ACOUST SPEE, P1876, DOI 10.1109/ICASSP.2015.7178296

← 1 2 3 4 →