Extraction of audio features specific to speech production for multimodal speaker detection

被引:18
作者
Besson, Patricia [1 ]
Popovici, Vlad [2 ]
Vesin, Jean-Marc [1 ]
Thiran, Jean-Philippe [1 ]
Kunt, Murat [1 ]
机构
[1] Swiss Fed Inst Technol EPFL, CH-1015 Lausanne, Switzerland
[2] Swiss Fed Inst Technol, Bioinformat Core Facil, CH-1015 Lausanne, Switzerland
关键词
audio features; differential evolution; multimodal; mutual information; speaker detection; speech;
D O I
10.1109/TMM.2007.911302
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 [计算机科学与技术];
摘要
A method that exploits an information theoretic framework to extract optimized audio features using video information is presented. A simple measure of mutual information (MI) between the resulting audio and video features allows the detection of the active speaker among different candidates. This method involves the optimization of an MI-based objective function. No approximation is needed to solve this optimization problem, neither for the estimation of the probability density functions (pdfs) of the features, nor for the cost function itself. The pdfs are estimated from the samples using a nonparametric approach. The challenging optimization problem is solved using a global method: the differential evolution algorithm. Two information theoretic optimization criteria are compared and their ability to extract audio features specific to speech production is discussed. Using these specific audio features, candidate video features are then classified as member of the "speaker" or "non-speaker" class, resulting in a speaker detection scheme. As a result, our method achieves a speaker detection rate of 100% on in-house test sequences, and of 85% on most commonly used sequences.
引用
收藏
页码:63 / 73
页数:11
相关论文
共 36 条
[1]
[Anonymous], 2000, SPEECH AUDIO SIGNAL
[2]
[Anonymous], P INT S IND COMP AN
[3]
BESSON P, 2006, TRITS2006003 EC POL
[4]
BESSON P, 2005, P EUR SIGN PROC C EU
[5]
BESSON P, 2005, 082005 EPFLITS
[6]
BESSON P, 2006, P 2 INT WORKSH BIOS, P106
[7]
Bowman AW, 1997, Applied Smoothing Techniques for Data Analysis: the Kernel Approach with S-Plus Illustrations
[8]
From error probability to information theoretic (multi-modal) signal processing [J].
Butz, T ;
Thiran, JP .
SIGNAL PROCESSING, 2005, 85 (05) :875-902
[9]
BUTZ T, 2002, P ICME LAUS SWITZ, V2, P361
[10]
Multiresolution registration of remote sensing imagery by optimization of mutual information using a stochastic gradient [J].
Cole-Rhodes, AA ;
Johnson, KL ;
LeMoigne, J ;
Zavorin, I .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2003, 12 (12) :1495-1511