A graphical model for audiovisual object tracking

被引:66
作者
Beal, MJ
Jojic, N
Attias, H
机构
[1] Univ Toronto, Dept Comp Sci, Toronto, ON M5S 3G4, Canada
[2] Microsoft Corp, Res, Redmond, WA 98052 USA
关键词
audio; video; audiovisual; graphical models; generative models; probabilistic inference; Bayesian inference; variational methods; expectation-maximization (EM) algorithm; multimodal; multimedia; tracking; speaker modeling; speech; vision; microphone arrays; cameras; automatic calibrations;
D O I
10.1109/TPAMI.2003.1206512
中图分类号
TP18 [人工智能理论];
学科分类号
081104 [模式识别与智能系统]; 0812 [计算机科学与技术]; 0835 [软件工程]; 1405 [智能科学与技术];
摘要
We present a new approach to modeling and processing multimedia data. This approach is based on graphical models that combine audio and video variables. We demonstrate it by developing a new algorithm for tracking a moving object in a cluttered, noisy scene using two microphones and a camera. Our model uses unobserved variables to describe the data in terms of the process that generates them. It is therefore able to capture and exploit the statistical structure of the audio and video data separately, as well as their mutual dependencies. Model parameters are learned from data via an EM algorithm, and automatic calibration is performed as part of this procedure. Tracking is done by Bayesian inference of the object location from data. We demonstrate successful performance on multimedia clips captured in real world scenarios using off-the-shelf equipment.
引用
收藏
页码:828 / 836
页数:9
相关论文
共 34 条
[1]
ATTIAS H, 1998, NEURAL COMPUTATION, V10
[2]
ATTIAS H, 2001, P EUR
[3]
BENYACOUB S, 2000, P IEEE C COMP VIS PA
[4]
BLAKE A., 1998, Active Contours
[5]
Brandstein M, 2001, DIGITAL SIGNAL PROC, P133
[6]
Time-delay estimation of reverberated speech exploiting harmonic structure [J].
Brandstein, MS .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1999, 105 (05) :2914-2919
[7]
BREGLER C, 1994, P IEEE C AC SPEECH S
[8]
CHEOK K, 2000, P IEEE C INT TRANSP
[9]
CUTLER R, 2002, P ACM MULT
[10]
CUTLER R, 2000, P IEEE C MULT EXP