Automatic meeting segmentation using dynamic Bayesian networks

被引：33

作者：

Dielmann, Alfred ^{[1
]}

Renals, Steve ^{[1
]}

机构：

[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh EH8 9LW, Midlothian, Scotland

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2007年 / 9卷 / 01期

关键词：

multimodal; multistream; meeting actions;

D O I：

10.1109/TMM.2006.886337

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Multiparty meetings are a ubiquitous feature of organizations, and there are considerable economic benefits that would arise from their automatic analysis and structuring. In this paper, we are concerned with the segmentation and structuring of meetings (recorded using multiple cameras and microphones) into sequences of group meeting actions such as monologue, discussion and presentation. We outline four families of multimodal features based on speaker turns, lexical transcription, prosody, and visual motion that are extracted from the raw audio and video recordings. We relate these low-level features to more complex group behaviors using a multistream modelling framework based on multistream dynamic Bayesian networks (DBNs). This results in an effective approach to the segmentation problem, resulting in an action error rate of 12.2%, compared with 43% using an approach based on hidden Markov models. Moreover, the multistream DBN developed here leaves scope for many further improvements and extensions.

引用

页码：25 / 36

页数：12

共 40 条

[11] The hierarchical hidden Markov model: Analysis and applications [J].

Fine, S ;

Singer, Y ;

Tishby, N .

MACHINE LEARNING, 1998, 32 (01) :41-62

[12] Boosted learning in dynamic Bayesian networks for Multimodal speaker detection [J].

Garg, A ;

Pavlovic, V ;

Rehg, JM .

PROCEEDINGS OF THE IEEE, 2003, 91 (09) :1355-1369

[13]

Hakeem A, 2004, PROCEEDING OF THE NINETEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE SIXTEENTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE, P263

[14]

HOWARD A, 2004, UNCERTAINTY ARTI JUL

[15]

JANIN A, 2003, P IEEE ICASSP APR

[16] Four paradigms for indexing video conferences [J].

Kazman, R ;

AlHalimi, R ;

Hunt, W ;

Mantei, M .

IEEE MULTIMEDIA, 1996, 3 (01) :63-73

[17]

LEE D, 2002, ACM MULTIMEDIA DEC

[18] Automatic analysis of multimodal group actions in meetings [J].

McCowan, I ;

Gatica-Perez, D ;

Bengio, S ;

Lathoud, G ;

Barnard, M ;

Zhang, D .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2005, 27 (03) :305-317

[19]

MCCOWAN J, 2003, P IEEE ICASSP

[20] TIME, INTERACTION, AND PERFORMANCE (TIP) - A THEORY OF GROUPS [J].

MCGRATH, JE .

SMALL GROUP RESEARCH, 1991, 22 (02) :147-174

← 1 2 3 4 →