Speech spectral modeling and enhancement based on autoregressive conditional heteroscedasticity models

被引：35

作者：

Cohen, I ^{[1
]}

机构：

[1] Technion Israel Inst Technol, Dept Elect Engn, IL-32000 Haifa, Israel

来源：

SIGNAL PROCESSING | 2006年 / 86卷 / 04期

关键词：

speech enhancement; speech modeling; GARCH; time-frequency analysis;

D O I：

10.1016/j.sigpro.2005.06.005

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In this paper, we develop and evaluate speech enhancement algorithms, which are based on supergaussian generalized autoregressive conditional heteroscedasticity (GARCH) models in the short-time Fourier transform (STFT) domain. We consider three different statistical models, two fidelity criteria, and two approaches for the estimation of the variances of the STFT coefficients. The statistical model is either Gaussian, Gamma or Laplacian; the fidelity criteria include minimum mean-squared error (MMSE) of the STFT coefficients and MMSE of the log-spectral amplitude (LSA); the spectral variance is estimated based on either the proposed GARCH models or the decision-directed method of Ephraim and Malah. We show that estimating the variance by the GARCH modeling method yields lower log-spectral distortion and higher perceptual evaluation of speech quality scores (PESQ, ITU-T P.862) than by using the decision-directed method, whether the presumed statistical model is Gaussian, Gamma or Laplacian, and whether the fidelity criterion is MMSE of the STFT coefficients or MMSE of the LSA. further-more while a gaussian model is inferior to the supergaussian models when USING the decision-directed method, the Gaussian model is superior when using the garch modeling method. (c) 2005 Published by Elsevier B.V.

引用

页码：698 / 709

页数：12

共 28 条

[1] A modular approach to speech enhancement with an application to speech coding [J].

Accardi, AJ ;

Cox, RV .

ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, :201-204

[2]

[Anonymous], P IEEE INT C AC SPEE

[3]

BERNDT EK, 1974, ANN ECON SOC MEAS, V3, P653

[4] GENERALIZED AUTOREGRESSIVE CONDITIONAL HETEROSKEDASTICITY [J].

BOLLERSLEV, T .

JOURNAL OF ECONOMETRICS, 1986, 31 (03) :307-327

[5] ARCH MODELING IN FINANCE - A REVIEW OF THE THEORY AND EMPIRICAL-EVIDENCE [J].

BOLLERSLEV, T ;

CHOU, RY ;

KRONER, KF .

JOURNAL OF ECONOMETRICS, 1992, 52 (1-2) :5-59

[6]

Breithaupt C, 2003, 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, P896

[7] Elimination of the Musical Noise Phenomenon with the Ephraim and Malah Noise Suppressor [J].

Cappe, Olivier .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (02) :345-349

[8] Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging [J].

Cohen, I .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (05) :466-475

[9] Modeling speech signals in the time-frequency domain using GARCH [J].

Cohen, I .

SIGNAL PROCESSING, 2004, 84 (12) :2453-2459

[10] Speech enhancement for non-stationary noise environments [J].

Cohen, I ;

Berdugo, B .

SIGNAL PROCESSING, 2001, 81 (11) :2403-2418

← 1 2 3 →