Exploring video content structure for hierarchical summarization

被引:58
作者
Zhu, XQ [1 ]
Wu, XD
Fan, JP
Elmagarmid, AK
Aref, WF
机构
[1] Univ Vermont, Dept Comp Sci, Burlington, VT 05405 USA
[2] Univ N Carolina, Dept Comp Sci, Charlotte, NC 28223 USA
[3] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA
关键词
hierarchical video summarization; video content hierarchy; video group detection; video scene detection; hierarchical clustering;
D O I
10.1007/s00530-004-0142-7
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose a hierarchical video summarization strategy that explores video content structure to provide the users with a scalable, multilevel video summary. First, video-shot- segmentation and keyframe-extraction algorithms are applied to parse video sequences into physical shots and discrete keyframes. Next, an affinity (self-correlation) matrix is constructed to merge visually similar shots into clusters (supergroups). Since video shots with high similarities do not necessarily imply that they belong to the same story unit, temporal information is adopted by merging temporally adjacent shots (within a specified distance) from the supergroup into each video group. A video-scene-detection algorithm is thus proposed to merge temporally or spatially correlated video groups into scenario units. This is followed by a scene-clustering algorithm that eliminates visual redundancy among the units. A hierarchical video content structure with increasing granularity is constructed from the clustered scenes, video scenes, and video groups to keyframes. Finally, we introduce a hierarchical video summarization scheme by executing various approaches at different levels of the video content hierarchy to statically or dynamically construct the video summary. Extensive experiments based on real-world videos have been performed to validate the effectiveness of the proposed approach.
引用
收藏
页码:98 / 115
页数:18
相关论文
共 53 条
[1]  
CHRISTEL M, 1999, P 6 ACM MULT C ORL F
[2]  
CHRISTEL M, 1999, P IEEE C ADV DIG LIB
[3]  
COSTEIRA, 1994, CMUCSTR94220
[4]  
DEMENTHON D, 1998, P ACM C MULT BRIST U, P13
[5]  
DOULAMIS A, 2000, SIGNAL PROCESS, V80, P8
[6]  
DOULAMIS N, 2000, IEEE T CSVT, V10
[7]  
EBADOLLAHI S, 2001, P SPIE
[8]  
Fan JP, 2001, LECT NOTES COMPUT SC, V2195, P837
[9]   MultiView: Multilevel video content representation and retrieval [J].
Fan, JP ;
Aref, WG ;
Elmagarmid, AK ;
Hacid, MS ;
Marzouk, MS ;
Zhu, XQ .
JOURNAL OF ELECTRONIC IMAGING, 2001, 10 (04) :895-908
[10]   Automatic model-based semantic object extraction algorithm [J].
Fan, JP ;
Zhu, XQ ;
Wu, L .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2001, 11 (10) :1073-1084