Generalizability and Simplicity as Criteria in Feature Selection: Application to Mood Classification in Music

被引：43

作者：

Saari, Pasi ^{[1
]}

Eerola, Tuomas ^{[1
]}

Lartillot, Olivier ^{[1
]}

机构：

[1] Univ Jyvaskyla, Dept Mus, Finnish Ctr Excellence Interdisciplinary Mus Res, FI-40014 Jyvaskyla, Finland

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2011年 / 19卷 / 06期

关键词：

Cross-indexing; feature selection; music and emotion; musical features; overfitting; wrapper selection; EMOTION; RECOGNITION;

D O I：

10.1109/TASL.2010.2101596

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Classification of musical audio signals according to expressed mood or emotion has evident applications to content-based music retrieval in large databases. Wrapper selection is a dimension reduction method that has been proposed for improving classification performance. However, the technique is prone to lead to overfitting of the training data, which decreases the generalizability of the obtained results. We claim that previous attempts to apply wrapper selection in the field of music information retrieval (MIR) have led to disputable conclusions about the used methods due to inadequate analysis frameworks, indicative of overfitting, and biased results. This paper presents a framework based on cross-indexing for obtaining realistic performance estimate of wrapper selection by taking into account the simplicity and generalizability of the classification models. The framework is applied on sets of film soundtrack excerpts that are consensually associated with particular basic emotions, comparing Naive Bayes, k-NN, and SVM classifiers using both forward selection (FS) and backward elimination (BE). K-NN with BE yields the most promising results-56.5% accuracy with only four features. The most useful feature subset for k-NN contains mode majorness and key clarity, combined with dynamical, rhythmical, and structural features.

引用

页码：1802 / 1812

页数：11

共 35 条

[1] [Anonymous], 1978, Multidimensional scaling
[2] [Anonymous], 2004, Journal of negative results in speech and audio sciences
[3] [Anonymous], 1994, MACHINE LEARNING P 1, DOI DOI 10.1016/B978-1-55860-335-6.50023-4
[4] The bag-of-frames approach to audio pattern recognition: A sufficient model for urban soundscapes but not for polyphonic music
Aucouturier, Jean-Julien
Defreville, Boris
Pachet, Francois
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2007, 122 (02) : 881 - 891
[5] Cormen T., 2001, Introduction to Algorithms
[6] Is the neutral condition relevant to study musical emotion in patients?
Dellacherie, Delphine
Ehrle, Nathalie
Samson, Severine
[J]. MUSIC PERCEPTION, 2008, 25 (04): : 285 - 294
[7] Dumais S, 2000, P 23 ANN INT ACM SIG, P256, DOI [10.1145/345508.345593, DOI 10.1145/345508.345593]
[8] EEROLA T, 2009, 7 TRIENN C EUR SOC C
[9] A comparison of the discrete and dimensional models of emotion in music
Eerola, Tuomas
Vuoskoski, Jonna K.
[J]. PSYCHOLOGY OF MUSIC, 2011, 39 (01) : 18 - 49
[10] Fiebrink R., 2005, Proceedings of the International Conference on Music Information Retrieval, P510

← 1 2 3 4 →