Correlative Multilabel Video Annotation with Temporal Kernels

被引：26

作者：

Qi, Guo-Jun ^{[1
]}

Hua, Xian-Sheng ^{[3
]}

Rui, Yong ^{[3
]}

Tang, Jinhui ^{[2
]}

Mei, Tao ^{[3
]}

Wang, Meng ^{[2
]}

Zhang, Hong-Jiang ^{[3
]}

机构：

[1] Univ Sci & Technol China, Dept Automat, Hefei 230027, Anhui, Peoples R China

[2] Univ Sci & Technol China, Dept Elect Engn & Informat Sci, Hefei 230027, Anhui, Peoples R China

[3] Microsoft Corp, Beijing 100080, Peoples R China

来源：

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS | 2008年 / 5卷 / 01期

关键词：

Algorithms; Theory; Experimentation; Video annotation; multilabeling; concept correlation; temporal kernel;

D O I：

10.1145/1404880.1404883

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Automatic video annotation is an important ingredient for semantic-level video browsing, search and navigation. Much attention has been paid to this topic in recent years. These researches have evolved through two paradigms. In the first paradigm, each concept is individually annotated by a pre-trained binary classifier. However, this method ignores the rich information between the video concepts and only achieves limited success. Evolved from the first paradigm, the methods in the second paradigm add an extra step on the top of the first individual classifiers to fuse the multiple detections of the concepts. However, the performance of these methods can be degraded by the error propagation incurred in the first step to the second fusion one. In this article, another paradigm of the video annotation method is proposed to address these problems. It simultaneously annotates the concepts as well as model correlations between them in one step by the proposed Correlative Multilabel (CML) method, which benefits from the compensation of complementary information between different labels. Furthermore, since the video clips are composed by temporally ordered frame sequences, we extend the proposed method to exploit the rich temporal information in the videos. Specifically, a temporal-kernel is incorporated into the CML method based on the discriminative information between Hidden Markov Models (HMMs) that are learned from the videos. We compare the performance between the proposed approach and the state-of-the-art approaches in the first and second paradigms on the widely used TRECVID data set. As to be shown, superior performance of the proposed method is gained.

引用

页数：27

共 44 条

[1]

[Anonymous], RC23612W0505104 IBM

[2]

[Anonymous], 22220068 COL U ADVEN

[3]

[Anonymous], ACM INT C IM VID

[4]

[Anonymous], P IEEE C COMP VIS PA

[5]

Berg BA, 2005, LECT NOTES SER INST, V7, P1

[6]

Boyd S., 2004, Convex Optimization, DOI [10.1017/CBO9780511804441, DOI 10.1017/CBO9780511804441]

[7]

CAMPBELL M, 2006, TREC VID RETR EV TRE

[8]

CHANG SF, 2006, TREC VID RETR EV TRE

[9]

Cover T.M., 2006, ELEMENTS INFORM THEO, V2nd, DOI [DOI 10.1002/0471200611, 10.1002/0471200611]

[10] Fast approximation of Kullback-Leibler distance for dependence trees and hidden Markov models [J].

Do, MN .

IEEE SIGNAL PROCESSING LETTERS, 2003, 10 (04) :115-118

← 1 2 3 4 5 →