Multi-modality video shot clustering with tensor representation

被引：11

作者：

Liu, Yanan ^{[1
]}

Wu, Fei ^{[1
]}

机构：

[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310027, Peoples R China

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2009年 / 41卷 / 01期

基金：

国家高技术研究发展计划(863计划); 中国国家自然科学基金;

关键词：

Multi-modality video shot clustering; TensorShot; Temporal-sequenced associated cooccurrence (TSAC) Dimensionality reduction; Affinity propagation clustering;

D O I：

10.1007/s11042-008-0220-5

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Video analysis and understanding is a challenging issue nowadays. Video data has multiple media modalities, which present a characteristic of temporal-sequenced associated cooccurrence (TSAC). Traditionally, videos are represented as vectors in the Euclidean space. Many learning algorithms are then applied to these vectors in a high dimensional space for dimensionality reduction, classification, clustering and recognition as well. However, the multiple modalities in video not only have their own properties, but also have correlations between them; whereas the simple vector representation weakens the power of these relatively independent modalities and even ignores their relations to some extent. Clustering is an important technique for multimedia data management. Recently, a powerful clustering algorithm named Affinity Propagation is devised. In this paper, we introduce a higher-order tensor framework for video analysis. In this framework, we represent image frame, audio stream and transcript text which are the three modalities in video shots as data points by the third-order tensor. Besides, we present a dimension reduction method for the high-dimensional features of video shots which explicitly considers the manifold structure of the tensor space from temporal-sequenced associated co-occurring multimodal media data. We call it TensorShot approach. Then we utilize the effective Affinity Propagation to cluster video shots that are in tensor form. Our algorithm preserves the intrinsic structure of the submanifold where tensorshots are sampled. The experiments on TRECVID2005 news video data set show that our algorithm achieves improved performance.

引用

页码：93 / 109

页数：17

共 20 条

[1]

[Anonymous], P 13 ANN ACM INT C M

[2]

[Anonymous], P ACM C MULT

[3]

[Anonymous], UNDERSTANDING BELIEF

[4]

[Anonymous], 1991, P 1991 IEEE COMP SOC, DOI DOI 10.1109/CVPR.1991.139758

[5]

[Anonymous], IEEE T PATTERN ANAL

[6]

[Anonymous], IEEE INT C AOUST SPE

[7]

[Anonymous], IMAGE VIDEO PROCESSI

[8]

[Anonymous], 2000, SIAM Journal on Matrix Analysis and Applications, DOI DOI 10.1137/S0895479896305696

[9]

[Anonymous], LNCS, DOI DOI 10.1007/3-540-47969-4_30

[10] Event based indexing of broadcasted sports video by intermodal collaboration [J].

Babaguchi, N ;

Kawai, Y ;

Kitahashi, T .

IEEE TRANSACTIONS ON MULTIMEDIA, 2002, 4 (01) :68-75

← 1 2 →