ON THE APPLICATION OF HIDDEN MARKOV-MODELS FOR ENHANCING NOISY SPEECH

被引:87
作者
EPHRAIM, Y [1 ]
MALAH, D [1 ]
JUANG, BH [1 ]
机构
[1] AT&T BELL LABS, DEPT SIGNAL PROC, MURRAY HILL, NJ 07974 USA
来源
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING | 1989年 / 37卷 / 12期
关键词
Probability - Signal Filtering and Prediction;
D O I
10.1109/29.45532
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A maximum-a-posteriori approach for enhancing speech signals which have been degraded by statistically independent additive noise is proposed. The approach is based on statistical modeling of the clean speech signal and the noise process using long training sequences from the two processes. Hidden Markov models (HMMs) with mixtures of Gaussian autoregressive (AR) output probability distributions (PDs) are used to model the clean speech signal. The model for the noise process depends on its nature. The parameter set of the HMM model is estimated using the Baum or the EM (estimation-maximization) algorithm. The noisy speech is enhanced by reestimating the clean speech waveform using the EM algorithm. Efficient approximations of the training and enhancement procedures are examined. This results in the segmental k-means approach for hidden Markov modeling, in which the state sequence and the parameter set of the model are alternately estimated. Similarly, the enhancement is done by alternate estimation of the state and observation sequences. An approximate improvement of 4.0-6.0 dB in signal-to-noise ratio (SNR) is achieved at 10-dB input SNR.
引用
收藏
页码:1846 / 1856
页数:11
相关论文
共 34 条
[1]   A MAXIMUM-LIKELIHOOD APPROACH TO CONTINUOUS SPEECH RECOGNITION [J].
BAHL, LR ;
JELINEK, F ;
MERCER, RL .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1983, 5 (02) :179-190
[2]  
Baum L., 1972, INEQUALITIES, V3, P1
[3]   A MAXIMIZATION TECHNIQUE OCCURRING IN STATISTICAL ANALYSIS OF PROBABILISTIC FUNCTIONS OF MARKOV CHAINS [J].
BAUM, LE ;
PETRIE, T ;
SOULES, G ;
WEISS, N .
ANNALS OF MATHEMATICAL STATISTICS, 1970, 41 (01) :164-&
[4]   SPEECH CODING BASED UPON VECTOR QUANTIZATION [J].
BUZO, A ;
GRAY, AH ;
GRAY, RM ;
MARKEL, JD .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (05) :562-574
[5]   WEIGHTED OVERLAP-ADD METHOD OF SHORT-TIME FOURIER ANALYSIS-SYNTHESIS [J].
CROCHIERE, RE .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (01) :99-102
[6]   THE RELATION BETWEEN MAXIMUM-LIKELIHOOD-ESTIMATION OF STRUCTURED COVARIANCE MATRICES AND PERIODOGRAMS [J].
DEMBO, A .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1986, 34 (06) :1661-1662
[7]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[8]   A UNIFIED APPROACH FOR ENCODING CLEAN AND NOISY SOURCES BY MEANS OF WAVEFORM AND AUTOREGRESSIVE MODEL VECTOR QUANTIZATION [J].
EPHRAIM, Y ;
GRAY, RM .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1988, 34 (04) :826-834
[9]  
EPHRAIM Y, UNPUB MINIMUM MEAN S
[10]  
FERGUSON JD, 1980, P S APPL HIDDEN MARK