Improvement to speech-music discrimination using sinusoidal model based features

被引：17

作者：

Shirazi, Jalil ^{[1
]}

Ghaemmaghami, Shahrokh ^{[2
,3
]}

机构：

[1] Islamic Azad Univ, Sci & Res Branch, Tehran, Iran

[2] Sharif Univ Technol, Dept Elect Engn, Tehran, Iran

[3] Sharif Univ Technol, Elect Res Ctr, Tehran, Iran

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2010年 / 50卷 / 02期

关键词：

Audio classification; Sinusoidal model; AUDIO CLASSIFICATION; RETRIEVAL;

D O I：

10.1007/s11042-009-0416-3

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

080201 [机械制造及其自动化];

摘要：

This paper addresses a model-based audio content analysis for classification of speech-music mixed audio signals into speech and music. A set of new features is presented and evaluated based on sinusoidal modeling of audio signals. The new feature set, including variance of the birth frequencies and duration of the longest frequency track in sinusoidal model, as a measure of the harmony and signal continuity, is introduced and discussed in detail. These features are used and compared to typical features as inputs to an audio classifier. Performance of these sinusoidal model features is evaluated through classification of audio into speech and music using both the GMM (Gaussian Mixture Model) and the SVM (Support Vector Machine) classifiers. Experimental results show that the proposed features are quite successful in speech/music discrimination. By using only a set of two sinusoidal model features, extracted from 1-s segments of the signal, we achieved 96.84% accuracy in the audio classification. Experimental comparisons also confirm superiority of the sinusoidal model features to the popular time domain and frequency domain features in audio classification.

引用

页码：415 / 435

页数：21

共 29 条

[1]

ABUQURAN AR, 2006, IEEE INT WORKSH MULT, P212

[2]

AJMERA J, 2003, ELSEVIER T SPEECH CO, P351

[3]

[Anonymous], 1973, Pattern Classification and Scene Analysis

[4]

BABU J, 2007, IEEE ICSCN 2007, P16

[5]

SUPPORT-VECTOR NETWORKS [J].

CORTES, C ;

VAPNIK, V .

MACHINE LEARNING, 1995, 20 (03) :273-297

[6]

CORTIZO E, 2005, EUROCON, P1666

[7]

EIMALEH K, 2000, ICASSP 2000, P2445

[8]

Content-based audio classification and retrieval by support vector machines [J].

Guo, GD ;

Li, SZ .

IEEE TRANSACTIONS ON NEURAL NETWORKS, 2003, 14 (01) :209-215

[9]

Speech enhancement using a constrained iterative sinusoidal model [J].

Jensen, J ;

Hansen, JHL .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (07) :731-740

[10]

Lagrange M, 2007, J AUDIO ENG SOC, V55, P385

← 1 2 3 →