Musical genre classification of audio signals

被引：1350

作者：

Tzanetakis, G ^{[1
]}

Cook, P ^{[1
]}

机构：

[1] Princeton Univ, Dept Comp Sci, Princeton, NJ 08544 USA

来源：

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 2002年 / 10卷 / 05期

基金：

美国国家科学基金会;

关键词：

audio classification; beat analysis; feature extraction; musical genre classification; wavelets;

D O I：

10.1109/TSA.2002.800560

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Musical genres are categorical labels created by humans to characterize pieces of music. A musical genre is characterized by the common characteristics shared by its members. These characteristics typically are related to the instrumentation, rhythmic structure, and harmonic content of the music. Genre hierarchies are commonly used to structure the large collections of music available on the Web. Currently musical genre annotation is performed manually. Automatic musical genre classification can assist or replace the human user in this process and would be a valuable addition to music information retrieval systems. In addition, automatic musical genre classification provides a framework for developing and evaluating features for any type of content-based analysis of musical signals. In this paper, the automatic classification of audio signals into an hierarchy of musical genres is explored. More specifically, three feature sets for representing timbral texture, rhythmic content and pitch content are proposed. The performance and relative importance of the proposed features is investigated by training statistical pattern recognition classifiers using real-world audio collections. Both whole file and real-time frame-based classification schemes are described. Using the proposed feature sets, classification of 61% for ten musical genres is achieved. This result is comparable to results reported for human musical genre classification.

引用

页码：293 / 302

页数：10

共 34 条

[1]

AUCOUTURIER JJ, 2001, P 110 AUD ENG SOC CO

[2] To catch a chorus: Using chroma-based representations for audio thumbnailing [J].

Bartsch, MA ;

Wakefield, GH .

PROCEEDINGS OF THE 2001 IEEE WORKSHOP ON THE APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, 2001, :15-18

[3] Locating singing voice segments within music signals [J].

Berenzweig, AL ;

Ellis, DPW .

PROCEEDINGS OF THE 2001 IEEE WORKSHOP ON THE APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, 2001, :119-122

[4] ORTHONORMAL BASES OF COMPACTLY SUPPORTED WAVELETS [J].

DAUBECHIES, I .

COMMUNICATIONS ON PURE AND APPLIED MATHEMATICS, 1988, 41 (07) :909-996

[5] COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].

DAVIS, SB ;

MERMELSTEIN, P .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366

[6]

Duda R. O., 2000, Pattern Classification and Scene Analysis, V2nd

[7]

FOOTE J, 2001, P INT C MULT EXP

[8] Content-based retrieval of music and audio [J].

Foote, JT .

MULTIMEDIA STORAGE AND ARCHIVING SYSTEMS II, 1997, 3229 :138-147

[9]

GOTO M, 1998, COMPUTATIONAL AUDITO, P157

[10]

KIMBER D, 1996, P INT C SYDN AUSTR J

← 1 2 3 4 →