Learning Spectral Mapping for Speech Dereverberation and Denoising

被引：211

作者：

Han, Kun ^{[1
]}

Wang, Yuxuan ^{[1
]}

Wang, DeLiang ^{[1
,2
]}

Woods, William S. ^{[3
]}

Merks, Ivo ^{[3
]}

Zhang, Tao ^{[3
]}

机构：

[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA

[2] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA

[3] Starkey Hearing Technol, Eden Prairie, MN 55344 USA

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2015年 / 23卷 / 06期

关键词：

Deep neural networks (DNNs); denoising; dereverberation; spectral mapping; supervised learning; MONAURAL SEGREGATION; REVERBERANT SPEECH; BINARY MASKING; NOISY; INTELLIGIBILITY; ALGORITHM; SEPARATION; NETWORK;

D O I：

10.1109/TASLP.2015.2416653

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In real-world environments, human speech is usually distorted by both reverberation and background noise, which have negative effects on speech intelligibility and speech quality. They also cause performance degradation in many speech technology applications, such as automatic speech recognition. Therefore, the dereverberation and denoising problems must be dealt with in daily listening environments. In this paper, we propose to perform speech dereverberation using supervised learning, and the supervised approach is then extended to address both dereverberation and denoising. Deep neural networks are trained to directly learn a spectral mapping from the magnitude spectrogram of corrupted speech to that of clean speech. The proposed approach substantially attenuates the distortion caused by reverberation, as well as background noise, and is conceptually simple. Systematic experiments show that the proposed approach leads to significant improvements of predicted speech intelligibility and quality, as well as automatic speech recognition in reverberant noisy conditions. Comparisons show that our approach substantially outperforms related methods.

引用

页码：982 / 992

页数：11

共 44 条

[1]

A. N. S. Institute, 1997, AM NAT STAND METH CA

[2]

Anastasakos T, 1996, ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, P1137, DOI 10.1109/ICSLP.1996.607807

[3]

[Anonymous], 1969, IEEE T ACOUST SPEECH, VAU17, P225

[4]

[Anonymous], 2006, Computational auditory scene analysis: Principles, algorithms, and applications

[5]

Avendano C, 1996, ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, P889, DOI 10.1109/ICSLP.1996.607744

[6]

Duchi J, 2011, J MACH LEARN RES, V12, P2121

[7] SIGNAL ESTIMATION FROM MODIFIED SHORT-TIME FOURIER-TRANSFORM [J].

GRIFFIN, DW ;

LIM, JS .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1984, 32 (02) :236-243

[8]

Habets E., 2010, Room impulse response (RIR) generator

[9]

Han K., 2014, Proc. ICASSP, P4661

[10] Blind binary masking for reverberation suppression in cochlear implants [J].

Hazrati, Oldooz ;

Lee, Jaewook ;

Loizou, Philipos C. .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2013, 133 (03) :1607-1614

← 1 2 3 4 5 →