Voice activity detection based on multiple statistical models

被引:182
作者
Chang, Joon-Hyuk [1 ]
Kim, Nam Soo
Mitra, Sanjit K.
机构
[1] Inha Univ, Dept Elect Engn, Inchon 402751, South Korea
[2] Seoul Natl Univ, Sch Elect Engn, Seoul 151742, South Korea
[3] Seoul Natl Univ, Inst New Media & Commun, Seoul 151742, South Korea
[4] Univ Calif Santa Barbara, Dept Elect & Comp Engn, Santa Barbara, CA 93106 USA
关键词
discrete cosine transform (DCT); generalized gamma function; maximum likelihood;
D O I
10.1109/TSP.2006.874403
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
One of the key issues in practical speech processing is to achieve robust voice activity detection (VAD) against the background noise. Most of the statistical model-based approaches have tried to employ the Gaussian assumption in the discrete Fourier transform (DFT) domain, which, however, deviates from the real observation. In this paper, we propose a class of VAD algorithms based on several statistical models. In addition to the Gaussian model, we also incorporate the complex Laplacian and Gamma probability density functions to our analysis of statistical proper-, ties. With a goodness-of-fit tests, we analyze the statistical properties of the DFT spectra of the noisy speech under various noise conditions. Based on the statistical analysis, the likelihood ratio test under the given statistical models is established for the purpose of VAD. Since the statistical characteristics of the speech signal are differently affected by the noise types and levels, to cope with the time-varying environments, our approach is aimed at finding adaptively an appropriate statistical model in an online fashion. The performance of the proposed VAD approaches in both the stationary and nonstationary noise environments is evaluated with the aid-of an objective measure.
引用
收藏
页码:1965 / 1976
页数:12
相关论文
共 34 条
[1]  
*3GPP2, 2001, CS00300 3GPP2
[2]  
[Anonymous], 1981, Time series data analysis and theory, DOI 10.1201/b15288-24
[3]  
[Anonymous], 1998, FUNDAMENTAL STAT SIG
[4]   A robust voice activity detector for wireless communications using soft computing [J].
Beritelli, F ;
Casale, S ;
Cavallaro, A .
IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 1998, 16 (09) :1818-1829
[5]   Elimination of the Musical Noise Phenomenon with the Ephraim and Malah Noise Suppressor [J].
Cappe, Olivier .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (02) :345-349
[6]  
Chang JH, 2002, 2002 IEEE SPEECH CODING WORKSHOP PROCEEDINGS, P175, DOI 10.1109/SCW.2002.1215763
[7]   Voice activity detection based on complex Laplacian model [J].
Chang, JH ;
Kim, NS .
ELECTRONICS LETTERS, 2003, 39 (07) :632-634
[8]  
Chang JH, 2001, IEICE T INF SYST, VE84D, P1231
[9]  
Cho YD, 2001, IEEE SIGNAL PROC LET, V8, P276, DOI 10.1109/97.957270
[10]   Noise estimation by minima controlled recursive averaging for robust speech enhancement [J].
Cohen, I ;
Berdugo, B .
IEEE SIGNAL PROCESSING LETTERS, 2002, 9 (01) :12-15