Multimodal speaker identification using an adaptive classifier cascade based on modality reliability

被引:44
作者
Erzin, E [1 ]
Yemez, Y [1 ]
Tekalp, AM [1 ]
机构
[1] Koc Univ, Coll Engn, Multimedia Vis & Graph Lab, TR-34450 Istanbul, Turkey
关键词
classifier combining; modality reliability; multimodal speaker identification;
D O I
10.1109/TMM.2005.854464
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 [计算机科学与技术];
摘要
We present a multimodal open-set speaker identification system that integrates information coming from audio, face and lip motion modalities. For fusion of multiple modalities, we propose a new adaptive cascade rule that favors reliable modality combinations through a cascade of classifiers. The order of the classifiers in the cascade is adaptively determined based on the reliability of each modality combination. A novel reliability measure, that genuinely fits to the open-set speaker identification problem, is also proposed to assess accept or reject decisions of a classifier. A formal framework is developed based on probability of correct decision for analytical comparison of the proposed adaptive rule with other classifier combination rules. The proposed adaptive rule is more robust in the presence of unreliable modalities, and outperforms the hard-level max rule and soft-level weighted summation rule, provided that the employed reliability measure is effective in assessment of classifier decisions. Experimental results that support this assertion are provided.
引用
收藏
页码:840 / 852
页数:13
相关论文
共 30 条
[1]
Unified decision combination framework [J].
Al-Ghoneim, K ;
Kumar, BVKV .
PATTERN RECOGNITION, 1998, 31 (12) :2077-2089
[2]
ALEXANDRE LA, 2000, P 15 ITN C PATT REC, V2, P3
[3]
An information theoretic framework for weight estimation in the combination of probabilistic classifiers for speaker identification [J].
Altinçay, H ;
Demirekler, M .
SPEECH COMMUNICATION, 2000, 30 (04) :255-272
[4]
Undesirable effects of output normalization in multiple classifier systems [J].
Altinçay, H ;
Demirekler, M .
PATTERN RECOGNITION LETTERS, 2003, 24 (9-10) :1163-1170
[5]
[Anonymous], 2013, Automated biometrics: Technologies and systems
[6]
Fusion of face and speech data for person identity verification [J].
Ben-Yacoub, S ;
Abdeljaoued, Y ;
Mayoraz, E .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1999, 10 (05) :1065-1074
[7]
CONSENSUS THEORETIC CLASSIFICATION METHODS [J].
BENEDIKTSSON, JA ;
SWAIN, PH .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1992, 22 (04) :688-704
[8]
Information combination operators for data fusion: A comparative review with classification [J].
Bloch, I .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 1996, 26 (01) :52-67
[9]
PERSON IDENTIFICATION USING MULTIPLE CUES [J].
BRUNELLI, R ;
FALAVIGNA, D .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1995, 17 (10) :955-966
[10]
Speaker recognition: A tutorial [J].
Campbell, JP .
PROCEEDINGS OF THE IEEE, 1997, 85 (09) :1437-1462