Single channel speech enhancement based on masking properties of the human auditory system

被引:395
作者
Virag, N [1 ]
机构
[1] Swiss Fed Inst Technol, Signal Proc Lab, CH-1015 Lausanne, Switzerland
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 1999年 / 7卷 / 02期
关键词
auditory properties; masking; noise reduction; speech recognition; subtractive-type algorithms;
D O I
10.1109/89.748118
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper addresses the problem of single channel speech enhancement at very low signal-to-noise ratios (SNR's) (<10 dB), The proposed approach is based on the introduction of an auditory model in a subtractive-type enhancement process. Single channel subtractive-type algorithms are characterized by a tradeoff between the amount of noise reduction, the speech distortion, and the level of musical residual noise, which can be modified by varying the subtraction parameters, Classical algorithms are usually limited to the use of fixed optimized parameters, which are difficult to choose for all speech and noise conditions. A new computationally efficient algorithm is developed here based on masking properties of the human auditory system. It allows for an automatic adaptation in time and frequency of the parametric enhancement system, and finds the best tradeoff based on a criterion correlated with perception. This leads to a significant reduction of the unnatural structure of the residual noise. Objective and subjective evaluation of the proposed system is performed with several noise types form the Noisex-92 database, having different time-frequency distributions. The application of objective measures, the study of the speech spectrograms, as well as subjective listening tests, confirm that the enhanced speech is more pleasant to a human listener, Finally, the proposed enhancement algorithm is tested as a front-end processor for speech recognition in noise, resulting in improved results over classical subtractive-type algorithms.
引用
收藏
页码:126 / 137
页数:12
相关论文
共 20 条
[1]  
[Anonymous], P IEEE INT C AC SPEE
[2]   SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION [J].
BOLL, SF .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02) :113-120
[3]  
Deller Jr J. R., 1993, DISCRETE TIME PROCES
[4]   SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR SHORT-TIME SPECTRAL AMPLITUDE ESTIMATOR [J].
EPHRAIM, Y ;
MALAH, D .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1984, 32 (06) :1109-1121
[5]   SPEECH RECOGNITION IN NOISY ENVIRONMENTS - A SURVEY [J].
GONG, YF .
SPEECH COMMUNICATION, 1995, 16 (03) :261-291
[6]   Morphological Constrained Feature Enhancement with Adaptive Cepstral Compensation (MCE-ACC) for Speech Recognition in Noise and Lombard Effect [J].
Hansen, John H. L. .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (04) :598-614
[8]   METHODS FOR CALCULATION AND USE OF ARTICULATION INDEX [J].
KRYTER, KD .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1962, 34 (11) :1689-&
[9]   ENHANCEMENT AND BANDWIDTH COMPRESSION OF NOISY SPEECH [J].
LIM, JS ;
OPPENHEIM, AV .
PROCEEDINGS OF THE IEEE, 1979, 67 (12) :1586-1604
[10]   EXPERIMENTS WITH A NONLINEAR SPECTRAL SUBTRACTOR (NSS), HIDDEN MARKOV-MODELS AND THE PROJECTION, FOR ROBUST SPEECH RECOGNITION IN CARS [J].
LOCKWOOD, P ;
BOUDY, J .
SPEECH COMMUNICATION, 1992, 11 (2-3) :215-228