Deep Scattering Spectrum

被引:385
作者
Anden, Joakim [1 ]
Mallat, Stephane [2 ]
机构
[1] Ecole Polytech, Ctr Math Appl, F-91128 Palaiseau, France
[2] Ecole Normale Super, Dept Informat, F-75005 Paris, France
关键词
Audio classification; deep neural networks; MFCC; modulation spectrum; wavelets; MODULATION; SPEECH;
D O I
10.1109/TSP.2014.2326991
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
A scattering transform defines a locally translation invariant representation which is stable to time-warping deformation. It extends MFCC representations by computing modulation spectrum coefficients of multiple orders, through cascades of wavelet convolutions and modulus operators. Second-order scattering coefficients characterize transient phenomena such as attacks and amplitude modulation. A frequency transposition invariant representation is obtained by applying a scattering transform along log-frequency. State-the-of-art classification results are obtained for musical genre and phone classification on GTZAN and TIMIT databases, respectively.
引用
收藏
页码:4114 / 4128
页数:15
相关论文
共 50 条
[1]  
Anden J., 2014, THESIS ECOLE POLYTEC
[2]  
Anden J., 2011, ISMIR, P657
[3]  
[Anonymous], 2010, ISMIR
[4]  
[Anonymous], IEEE ISCAS
[5]  
Battenberg E, 2012, ISMIR
[6]  
Bauge C., 2013, IEEE ICASSP
[7]   Invariant Scattering Convolution Networks [J].
Bruna, Joan ;
Mallat, Stephane .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1872-1886
[8]   Phase Retrieval via Matrix Completion [J].
Candes, Emmanuel J. ;
Eldar, Yonina C. ;
Strohmer, Thomas ;
Voroninski, Vladislav .
SIAM JOURNAL ON IMAGING SCIENCES, 2013, 6 (01) :199-225
[9]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[10]   Hierarchical large-margin Gaussian mixture models for phonetic classification [J].
Chang, Hung-An ;
Glass, James R. .
2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, :272-277