Automatic meeting segmentation using dynamic Bayesian networks

被引:33
作者
Dielmann, Alfred [1 ]
Renals, Steve [1 ]
机构
[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh EH8 9LW, Midlothian, Scotland
关键词
multimodal; multistream; meeting actions;
D O I
10.1109/TMM.2006.886337
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multiparty meetings are a ubiquitous feature of organizations, and there are considerable economic benefits that would arise from their automatic analysis and structuring. In this paper, we are concerned with the segmentation and structuring of meetings (recorded using multiple cameras and microphones) into sequences of group meeting actions such as monologue, discussion and presentation. We outline four families of multimodal features based on speaker turns, lexical transcription, prosody, and visual motion that are extracted from the raw audio and video recordings. We relate these low-level features to more complex group behaviors using a multistream modelling framework based on multistream dynamic Bayesian networks (DBNs). This results in an effective approach to the segmentation problem, resulting in an action error rate of 12.2%, compared with 43% using an approach based on hidden Markov models. Moreover, the multistream DBN developed here leaves scope for many further improvements and extensions.
引用
收藏
页码:25 / 36
页数:12
相关论文
共 40 条
[1]   A multi-modal mixed-state dynamic Bayesian network for robust meeting event recognition from disturbed data [J].
Al-Hames, M ;
Rigoll, G .
2005 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), VOLS 1 AND 2, 2005, :45-48
[2]  
ALHAMES M, 2006, P MULT INT REL MACH, P52
[3]  
BASU S, 2001, P IEEE WORKSH CUES C
[4]  
BENGIO S, 2003, ADV NEURAL INFORM PR
[5]  
BILMES J, 2002, P IEEE ICASSP JUN
[6]  
BILMES JA, 2003, MATH FDN SPEECH LANG
[7]  
Carletta J, 2005, LECT NOTES COMPUT SC, V3869, P28
[8]  
Dielmann A, 2004, 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS, P629
[9]  
Dielmann A, 2004, 2004 IEEE 6TH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, P167
[10]  
DUPONT S, 2000, P IEEE T MULT, V2