Automatic video data structuring through shot partitioning and key-frame computing

被引:21
作者
Xiong, W [1 ]
Lee, JCM [1 ]
Ma, RH [1 ]
机构
[1] HONG KONG UNIV SCI & TECHNOL, DEPT COMP SCI, KOWLOON, HONG KONG
关键词
automatic video data structuring; video partitioning; image similarity measure; wavelet; invariant parameters; key-frame computing; key-frame pruning;
D O I
10.1007/s001380050059
中图分类号
TP18 [人工智能理论];
学科分类号
081104 [模式识别与智能系统]; 0812 [计算机科学与技术]; 0835 [软件工程]; 1405 [智能科学与技术];
摘要
In video processing, a common first step is to seg ment the videos into physical units, generally called shots. A shot is a video segment that consists of one continuous action. In general, these physical units need to be clustered to form more semantically significant units, such as scenes, sequences, programs, etc. This is the so-called story-based video structuring. Automatic video structuring is of great importance for video browsing and retrieval. The shots or scenes are usually described by one or several representative frames, called key-frames. Viewed from a higher level, key frames of some shots might be redundant in terms of semantics. In this paper, we propose automatic solutions to the problems of: (i) video partitioning, (ii) key frame computing, (iii) key frame pruning. For the first problem, an algorithm called ''net comparison'' is devised. It is accurate and fast because it uses both statistical and spatial information in an image and does not have to process the entire image. For the last two problems, we develop an original image similarity criterion, which considers both spatial layout and detail content in an image. For this purpose, coefficients of wavelet decomposition are used to derive parameter vectors accounting for the above two aspects. The parameters exhibit (quasi-) invariant properties, thus making the algorithm robust for many types of object/camera motions and scaling variances. The novel ''seek and spread'' strategy used in key frame computing allows us to obtain a large representative range for the key frames. Inter-shot redundancy of the key-frames is suppressed using the same image similarity measure. Experimental results demonstrate the effectiveness and efficiency of our techniques.
引用
收藏
页码:51 / 65
页数:15
相关论文
共 23 条
[1]
[Anonymous], DIGITAL IMAGE PROCES
[2]
Chui C. K., 1992, An introduction to wavelets, V1
[3]
*E M A, 1994, P ACM C MULT 94 SAN, P97
[4]
HIRATA K, 1992, LECT NOTES COMPUT SC, V580, P56
[5]
Jain K, 1988, Algorithms for clustering data
[6]
LEE JCM, 1994, IAPR WORKSH MACH VIS, P502
[7]
LEE JCM, 1995, P 2 AS C COMP VIS SI, V2, P524
[8]
A THEORY FOR MULTIRESOLUTION SIGNAL DECOMPOSITION - THE WAVELET REPRESENTATION [J].
MALLAT, SG .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1989, 11 (07) :674-693
[9]
Martin V., 1995, WAVELETS SUBBAND COD
[10]
MENG JH, 1995, P SOC PHOTO-OPT INS, V2419, P14, DOI 10.1117/12.206359