An experimental comparison of audio tempo induction algorithms

被引:84
作者
Gouyon, Fabien [1 ]
Klapuri, Anssi
Dixon, Simon
Alonso, Miguel
Tzanetakis, George
Uhle, Christian
Cano, Pedro
机构
[1] Univ Pompeu Fabra, Barcelona 08002, Spain
[2] Tampere Univ Technol, FIN-33101 Tampere, Finland
[3] Austrian Res Inst Artificial Intelligence, A-1010 Vienna, Austria
[4] Ecole Natl Super Telecommun Bretagne, F-75634 Paris, France
[5] Univ Victoria, Victoria, BC V8W 2Y2, Canada
[6] Fraunhofer Inst Digital Media Technol, D-98693 Ilmenau, Germany
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2006年 / 14卷 / 05期
关键词
benchmark; evaluation; tempo induction;
D O I
10.1109/TSA.2005.858509
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We report on the tempo induction contest organized during the International Conference on Music Information Retrieval (ISMIR 2004) held at the University Pompeu Fabra in Barcelona, Spain, in October 2004. The goal of this contest was to evaluate some state-of-the-art algorithms in the task of inducing the basic tempo (as a scalar, in beats per minute) from musical audio signals. To our knowledge, this is the first published large scale cross-validation of audio tempo induction algorithms. Participants were invited to submit algorithms to the contest organizer, in one of several allowed formats. No training data was provided. A total of 12 entries (representing the work of seven research teams) were evaluated, 11 of which are reported in this document. Results on the test set of 3199 instances were returned to the participants before they were made public. Anssi Klapuri's algorithm won the contest. This evaluation shows that tempo induction algorithms can reach over 80% accuracy for music with a constant tempo, if we do not insist on finding a specific metrical level. After the competition, the algorithms and results were analyzed in order to discover general lessons for the future development of tempo induction systems. One conclusion is that robust tempo induction entails the processing of frame features rather than that of onset lists. Further, we propose a new "redundant" approach to tempo induction, inspired by knowledge of human perceptual mechanisms, which combines multiple simpler methods using a voting mechanism. Machine emulation of human tempo induction is still an open issue. Many avenues for future work in audio tempo tracking are highlighted, as for instance the definition of the best rhythmic features and the most appropriate periodicity detection method. In order to stimulate further research, the contest results, annotations, evaluation software and part of the data are available at http://ismir2004.ismir.net/ISMIR-Contest.html.
引用
收藏
页码:1832 / 1844
页数:13
相关论文
共 26 条
[1]  
ALONSO M, 2004, P INT C MUS INF RETR, P158
[2]  
[Anonymous], HDB BRAIN THEORY NEU, DOI DOI 10.1007/978-1-4419-9326-7_1
[3]  
BILMES JA, 1993, THESIS MIT CAMBRIDGE
[4]  
BREGMAN AS, 1998, COMPUTATIONAL AUDITO
[5]  
CANO P, 2005, IN PRESS ISMIR 2004
[6]   On tempo tracking: Tempogram representation and Kalman filtering [J].
Cemgil, AT ;
Kappen, B ;
Desain, P ;
Honing, H .
JOURNAL OF NEW MUSIC RESEARCH, 2000, 29 (04) :259-273
[7]  
Desain P, 1998, MUSIC PERCEPT, V16, P151
[8]   Approximate statistical tests for comparing supervised classification learning algorithms [J].
Dietterich, TG .
NEURAL COMPUTATION, 1998, 10 (07) :1895-1923
[9]   Automatic extraction of tempo and beat from expressive performances [J].
Dixon, S .
JOURNAL OF NEW MUSIC RESEARCH, 2001, 30 (01) :39-58
[10]  
DIXON S, 2003, P 4 INT C MUS INF RE, P159