Deep Scattering Spectrum

被引：385

作者：

Anden, Joakim ^{[1
]}

Mallat, Stephane ^{[2
]}

机构：

[1] Ecole Polytech, Ctr Math Appl, F-91128 Palaiseau, France

[2] Ecole Normale Super, Dept Informat, F-75005 Paris, France

来源：

IEEE TRANSACTIONS ON SIGNAL PROCESSING | 2014年 / 62卷 / 16期

关键词：

Audio classification; deep neural networks; MFCC; modulation spectrum; wavelets; MODULATION; SPEECH;

D O I：

10.1109/TSP.2014.2326991

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

A scattering transform defines a locally translation invariant representation which is stable to time-warping deformation. It extends MFCC representations by computing modulation spectrum coefficients of multiple orders, through cascades of wavelet convolutions and modulus operators. Second-order scattering coefficients characterize transient phenomena such as attacks and amplitude modulation. A frequency transposition invariant representation is obtained by applying a scattering transform along log-frequency. State-the-of-art classification results are obtained for musical genre and phone classification on GTZAN and TIMIT databases, respectively.

引用

页码：4114 / 4128

页数：15

共 50 条

[1]

Anden J., 2014, THESIS ECOLE POLYTEC

[2]

Anden J., 2011, ISMIR, P657

[3]

[Anonymous], 2010, ISMIR

[4]

[Anonymous], IEEE ISCAS

[5]

Battenberg E, 2012, ISMIR

[6]

Bauge C., 2013, IEEE ICASSP

[7] Invariant Scattering Convolution Networks [J].

Bruna, Joan ;

Mallat, Stephane .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1872-1886

[8] Phase Retrieval via Matrix Completion [J].

Candes, Emmanuel J. ;

Eldar, Yonina C. ;

Strohmer, Thomas ;

Voroninski, Vladislav .

SIAM JOURNAL ON IMAGING SCIENCES, 2013, 6 (01) :199-225

[9] LIBSVM: A Library for Support Vector Machines [J].

Chang, Chih-Chung ;

Lin, Chih-Jen .

ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)

[10] Hierarchical large-margin Gaussian mixture models for phonetic classification [J].

Chang, Hung-An ;

Glass, James R. .

2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, :272-277

← 1 2 3 4 5 →