An improved voice activity detection using higher order statistics

被引:56
作者
Li, K [1 ]
Swamy, MNS
Ahmad, MO
机构
[1] Siemens, Beijing 100102, Peoples R China
[2] Concordia Univ, Dept Elect & Comp Engn, Montreal, PQ H3G 1M8, Canada
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 2005年 / 13卷 / 05期
基金
加拿大自然科学与工程研究理事会;
关键词
higher order statistics; low band to full band energy ratio; voice activity detection;
D O I
10.1109/TSA.2005.851955
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, by using the properties of the higher order statistics (HOS) of speech and noise signals, we develop an improved voice activity detection (VAD) scheme. The proposed scheme employs the logarithm of the kurtosis of the LPC residual of a speech signal and is shown to be more effective and efficient in detecting active speech in medium to low signal-to-noise ratio (SNR) conditions without being unduly affected by the variations in the signal energy. To overcome the inability of the HOS in detecting unvoiced speech, another metric (the low band to full band energy ratio) is introduced. Depending on the estimated mean SNR, the proposed scheme works adaptively in two modes: a simple mode using only the SNR, and an enhanced mode using the HOS, the low band to full band energy ratio and the SNR. This scheme is capable of avoiding unnecessary computations, while maintaining the same performance as that working only in the enhanced mode. Simulations results are presented to demonstrate the effectiveness of the proposed voice activity detection scheme.
引用
收藏
页码:965 / 974
页数:10
相关论文
共 18 条
[1]   A robust voice activity detector for wireless communications using soft computing [J].
Beritelli, F ;
Casale, S ;
Cavallaro, A .
IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 1998, 16 (09) :1818-1829
[2]   Performance evaluation and comparison of G.729/AMR/fuzzy voice activity detectors [J].
Beritelli, F ;
Casale, S ;
Ruggeri, G ;
Serrano, S .
IEEE SIGNAL PROCESSING LETTERS, 2002, 9 (03) :85-88
[3]  
Cho YD, 2001, IEEE SIGNAL PROC LET, V8, P276, DOI 10.1109/97.957270
[4]   Noise estimation by minima controlled recursive averaging for robust speech enhancement [J].
Cohen, I ;
Berdugo, B .
IEEE SIGNAL PROCESSING LETTERS, 2002, 9 (01) :12-15
[5]   A soft voice activity detector based on a Laplacian-Gaussian model [J].
Gazor, S ;
Zhang, W .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (05) :498-505
[6]  
*ITU T, 1996, G7231 ITUT
[7]  
ITU-T, 1996, SIL COMPR SCHEM G 72
[8]   SPEECH ANALYSIS SYNTHESIS BASED ON A SINUSOIDAL REPRESENTATION [J].
MCAULAY, RJ ;
QUATIERI, TF .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1986, 34 (04) :744-754
[9]   TUTORIAL ON HIGHER-ORDER STATISTICS (SPECTRA) IN SIGNAL-PROCESSING AND SYSTEM-THEORY - THEORETICAL RESULTS AND SOME APPLICATIONS [J].
MENDEL, JM .
PROCEEDINGS OF THE IEEE, 1991, 79 (03) :278-305
[10]   Robust voice activity detection using higher-order statistics in the LPC residual domain [J].
Nemer, E ;
Goubran, R ;
Mahmoud, S .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (03) :217-231