Robust joint audio-video localization in video conferencing using reliability information

被引:22
作者
Lo, D [1 ]
Goubran, RA
Dansereau, RM
Thompson, G
Schulz, D
机构
[1] Carleton Univ, Dept Syst & Comp Engn, Ottawa, ON K1S 5B6, Canada
[2] Mitel Networks, Kanata, ON K2K 2W7, Canada
关键词
audio; data fusion; localization; microphone array; reliability; video; video conferencing;
D O I
10.1109/TIM.2004.831181
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 [电气工程]; 0809 [电子科学与技术];
摘要
This paper proposes a new method for performing joint audio-video talker localization that explores the reliability of the individual localization estimates such as audio, motion detection, and skin-color detection. The reliability information is estimated from the audio and video data separately. The proposed method then uses this reliability information in conjunction with a simple summing voter to dynamically discriminate erroneous outputs from the localizers while performing fusion on the localization results. Based on the voter output, a majority rule is then used to make the final decision of the active talker's current location. The results show that adding the reliability information during fusion improves localization performance when compared to audio only, motion detection only, skin-color detection only, and joint audio-video using straight summing fusion localization methods. The computational complexity of the proposed method is comparable to the existing ones.
引用
收藏
页码:1132 / 1139
页数:8
相关论文
共 22 条
[1]
A three-step camera calibration method [J].
Bacakoglu, H ;
Kamel, MS .
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 1997, 46 (05) :1165-1172
[2]
A four-step camera calibration procedure with implicit image correction [J].
Heikkila, J ;
Silven, O .
1997 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, PROCEEDINGS, 1997, :1106-1112
[3]
Face detection in color images [J].
Hsu, RL ;
Abdel-Mottaleb, M ;
Jain, AK .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (05) :696-706
[4]
Johnson D. H., 1993, ARRAY SIGNAL PROCESS, P112
[5]
Lo D, 2003, IEEE IMTC P, P1414
[6]
Messom CH, 2002, IEEE IMTC P, P1055, DOI 10.1109/IMTC.2002.1007101
[7]
OMOLOGO M, 1996, P IEEE INT C AC SPEE, V2, P921
[8]
Poynton C. A., 1996, TECHNICAL INTRO DIGI, P176
[9]
Array optimization applied in the near field of a microphone array [J].
Ryan, JG ;
Goubran, RA .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (02) :173-176
[10]
Ryan JG, 2003, IEEE T VEH TECHNOL, V52, P390, DOI [10.1109/TVT.2002.808803, 10.1109/TVT.2003.808803]