Audio thumbnailing of popular music using chroma-based representations

被引:142
作者
Bartsch, MA [1 ]
Wakefield, GH
机构
[1] ATK Mission Res, Beavercreek, OH 45430 USA
[2] Univ Michigan, Dept Elect Engn & Comp Sci, Ann Arbor, MI 48109 USA
基金
美国国家科学基金会;
关键词
audio summarization; chroma; feature extraction; musical structure; popular music;
D O I
10.1109/TMM.2004.840597
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 [计算机科学与技术];
摘要
With the growing prevalence of large databases of multimedia content, methods for facilitating rapid browsing of such databases or the results of a database search are becoming increasingly important. However, these methods are necessarily media dependent. We present a system for producing short, representative samples (or "audio thumbnails") of selections of popular music. The system searches for structural redundancy within a given song with the aim of identifying something like a chorus or refrain. To isolate a useful class of features for performing such structure-based pattern recognition, we present a development of the chromagram, a variation on traditional time-frequency distributions that seeks to represent the cyclic attribute of pitch perception, known as chroma. The pattern recognition system itself employs a quantized chromagram that represents the spectral energy at each of the 12 pitch classes. We evaluate the system on a database of popular music and score its performance against a set of "ideal" thumbnail locations. Overall performance is found to be quite good, with the majority of errors resulting from songs that do not meet our structural assumptions.
引用
收藏
页码:96 / 104
页数:9
相关论文
共 20 条
[1]
[Anonymous], 1986, PSYCHOL MUSIC, DOI [10.1177/0305735686141004, DOI 10.1177/0305735686141004]
[2]
COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].
DAVIS, SB ;
MERMELSTEIN, P .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366
[3]
Dixon S., 2000, PRICAI 2000. Topics in Artificial Intelligence. 6th Pacific Rim International Conference on Artificial Intelligence. Proceedings (Lecture Notes in Artificial Intelligence Vol.1886), P778
[4]
Visualizing music and audio using self-similarity [J].
Foote, J .
ACM MULTIMEDIA 99, PROCEEDINGS, 1999, :77-80
[5]
FOOTE J, 2000, P IEEE INT C MULT EX, V1, P452
[6]
FOOTE J, 1997, AM ASS ARTIFICIAL IN, P1
[7]
GERHARD DB, 1997, 9713 CMPT TR S FRAS
[8]
HIRSCHBERG J, 1999, P ESCA WORKSH ACC IN, P117
[9]
Kimber D., 1996, P INT C INT FDN N AM, P295
[10]
Logan B, 2000, INT CONF ACOUST SPEE, P749