Unified Video Annotation via Multigraph Learning

被引:344
作者
Wang, Meng [1 ]
Hua, Xian-Sheng [1 ]
Hong, Richang [2 ]
Tang, Jinhui [2 ]
Qi, Guo-Jun [2 ]
Song, Yan [2 ]
机构
[1] Microsoft Res Asia, Beijing 100080, Peoples R China
[2] Univ Sci & Technol China, Hefei 230027, Peoples R China
关键词
Multimodal fusion; semi-supervised learning; video annotation;
D O I
10.1109/TCSVT.2009.2017400
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Learning-based video annotation is a promising approach to facilitating video retrieval and it can avoid the intensive labor costs of pure manual annotation. But it frequently encounters several difficulties, such as insufficiency of training data and the curse of dimensionality. In this paper, we propose a method named optimized multigraph-based semi-supervised learning (OMG-SSL), which aims to simultaneously tackle these difficulties in a unified scheme. We show that various crucial factors in video annotation, including multiple modalities, multiple distance functions, and temporal consistency, all correspond to different relationships among video units, and hence they can be represented by different graphs. Therefore, these factors can be simultaneously dealt with by learning with multiple graphs, namely, the proposed OMG-SSL approach. Different from the existing graph-based semi-supervised learning methods that only utilize one graph, OMG-SSL integrates multiple graphs into a regularization framework in order to sufficiently explore their complementation. We show that this scheme is equivalent to first fusing multiple graphs and then conducting semi-supervised learning on the fused graph. Through an optimization approach, it is able to assign suitable weights to the graphs. Furthermore, we show that the proposed method can be implemented through a computationally efficient iterative process. Extensive experiments on the TREC video retrieval evaluation (TRECVID) benchmark have demonstrated the effectiveness and efficiency of our proposed approach.
引用
收藏
页码:733 / 746
页数:14
相关论文
共 53 条
[1]  
Amir A., 2005, P IBM RES TRECVID 20, P1
[2]  
[Anonymous], P ACM MULT, DOI DOI 10.1145/957013.957086
[3]  
[Anonymous], P 13 ANN ACM INT C M
[4]  
[Anonymous], P INT WORKSH MULT IN
[5]  
[Anonymous], 2007, P 6 ACM INT C IMAGE
[6]  
[Anonymous], 2004, P ADV NEUR INF PROC
[7]  
[Anonymous], P ACM INT WORKSH MUL
[8]  
[Anonymous], 2005, SEMISUPERVISED LEARN
[9]  
[Anonymous], P ACM INT C MULT
[10]  
Balcan M, 2005, Person identification in webcam images: an application of semi-supervised learning,, P1