Media segmentation using self-similarity decomposition

被引:38
作者
Foote, JT [1 ]
Cooper, ML [1 ]
机构
[1] FX Palo Alto Lab, Palo Alto, CA 94304 USA
来源
STORAGE AND RETRIEVAL FOR MEDIA DATABASES 2003 | 2003年 / 5021卷
关键词
digital media and audio processing; video and audio segmentation; music and audio summarization and thumbnails;
D O I
10.1117/12.476302
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present a framework for analyzing the structure of digital media streams. Though our methods work for video, text, and audio, we concentrate on detecting the structure of digital music files. In the first step, spectral data is used to construct a similarity matrix calculated from inter-frame spectral similarity. The digital audio can be robustly segmented by correlating a kernel along the diagonal of the similarity matrix. Once segmented, spectral statistics of each segment are computed. In the second step, segments are clustered based on the self-similarity of their statistics. This reveals the structure of the digital music in a set of segment boundaries and labels. Finally, the music can be summarized by selecting clusters with repeated segments throughout the piece. The summaries can be customized for various applications based on the structure of the original music.
引用
收藏
页码:167 / 175
页数:9
相关论文
共 16 条
[1]  
[Anonymous], 1973, PATTERN RECOGNITION
[2]  
CHU S, 2000, P IEEE INT C AC SPEE
[3]  
Church K.W., 1993, J COMPUT GRAPH STAT, V2, P153, DOI [10.2307/1390697, DOI 10.2307/1390697, DOI 10.1080/10618600.1993.10474605, 10.1080/10618600.1993, DOI 10.1080/10618600.1993]
[4]  
Cooper M, 2001, 2001 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL III, PROCEEDINGS, P378, DOI 10.1109/ICIP.2001.958130
[5]  
COOPER M, 2002, P 3 INT C MUS INF RE, P81
[6]  
Cover T. M., 2005, ELEM INF THEORY, DOI 10.1002/047174882X
[7]   RECURRENCE PLOTS OF DYNAMIC-SYSTEMS [J].
ECKMANN, JP ;
KAMPHORST, SO ;
RUELLE, D .
EUROPHYSICS LETTERS, 1987, 4 (09) :973-977
[8]  
Foote J, 2000, 2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, P452, DOI 10.1109/ICME.2000.869637
[9]  
FOOTE J, 2002, P IEEE INT C MULT EX, P378
[10]  
Forsyth DA, 2002, COMPUTER VISION MODE