Audio Signal Representations for Indexing in the Transform Domain

被引:14
作者
Ravelli, Emmanuel [1 ]
Richard, Gael [2 ]
Daudet, Laurent [1 ]
机构
[1] Univ Paris 06, Inst Jean Rond Alembert LAM, F-75015 Paris, France
[2] Telecom ParisTech, Inst Telecom, CNRS LTCI, F-75014 Paris, France
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2010年 / 18卷 / 03期
基金
英国工程与自然科学研究理事会;
关键词
Audio coding; audio indexing; sparse representations; time-frequency representations; MUSICAL GENRE CLASSIFICATION; BEAT; TEMPO;
D O I
10.1109/TASL.2009.2025099
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Indexing audio signals directly in the transform domain can potentially save a significant amount of computation when working on a large database of signals stored in a lossy compression format, without having to fully decode the signals. Here, we show that the representations used in standard transform-based audio codecs (e.g., MDCT for AAC, or hybrid PQF/MDCT for MP3) have a sufficient time resolution for some rhythmic features, but a poor frequency resolution, which prevents their use in tonality-related applications. Alternatively, a recently developed audio codec based on a sparse multi-scale MDCT transform has a good resolution both for time- and frequency-domain features. We show that this new audio codec allows efficient transform-domain audio indexing for three different applications, namely beat tracking, chord recognition, and musical genre classification. We compare results obtained with this new audio codec and the two standard MP3 and AAC codecs, in terms of performance and computation time.
引用
收藏
页码:434 / 446
页数:13
相关论文
共 40 条
[1]  
[Anonymous], 2004, J NEGAT RESULTS SPEE
[2]  
Bell RE, 2004, J ENDOVASC THER, V11, P6
[3]  
Bello J. P., 2005, P 6 INT C MUSIC INFO, P304, DOI 10.5281/zenodo.1417431
[4]   Aggregate features and ADABOOST for music classification [J].
Bergstra, James ;
Casagrande, Norman ;
Erhan, Dumitru ;
Eck, Douglas ;
Kegl, Balazs .
MACHINE LEARNING, 2006, 65 (2-3) :473-484
[5]   Context-dependent beat tracking of musical audio [J].
Davies, Matthew E. P. ;
Plumbley, Mark D. .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (03) :1009-1020
[6]   COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].
DAVIS, SB ;
MERMELSTEIN, P .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366
[7]   Automatic extraction of tempo and beat from expressive performances [J].
Dixon, S .
JOURNAL OF NEW MUSIC RESEARCH, 2001, 30 (01) :39-58
[8]  
*FAAC, 2008, FAAC FAAD WEBP
[9]  
Fujishima T., 1999, P INT COMP MUS C, P464
[10]  
Gomez E., 2004, P INT C MUSIC INFORM, P92