Security monitoring using microphone arrays and audio classification

被引：41

作者：

Abu-El-Quran, Ahmad R. ^{[1
]}

Goubran, Rafik A. ^{[1
]}

Chan, Adrian D. C. ^{[1
]}

机构：

[1] Carleton Univ, Dept Syst & Comp Engn, Ottawa, ON K1S 5B6, Canada

来源：

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT | 2006年 / 55卷 / 04期

基金：

加拿大自然科学与工程研究理事会;

关键词：

audio classification; beamforming; feature extraction; microphone arrays; security monitoring; speech processing;

D O I：

10.1109/TIM.2006.876394

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 [电气工程]; 0809 [电子科学与技术];

摘要：

In the paper, the authors propose a security monitoring system that can detect and classify the location and nature of different sounds within a room. This system is reliable and robust even in the presence of reverberation and in low signal-to-noise (SNR) environments. We describe a novel algorithm for audio classification, which, first, classifies an audio segment as speech or nonspeech and, second, classifies nonspeech audio segments into a particular audio type. To classify an audio segment as speech or nonspeech, this algorithm divides the audio segment into frames, estimates the presence of pitch in each frame, and calculates a pitch ratio (PR) parameter; it is this PR parameter that is used to discriminate speech audio segments from nonspeech audio segments. The discerning threshold for the PR parameter is adaptive to accommodate different environments. A time-delayed neural network is employed to further classify nonspeech audio segments into an audio type. The performance of this novel audio classification algorithm is evaluated using a library of audio segments. This library includes both speech segments and nonspeech segments, such as windows breaking and footsteps. Evaluation is performed under different SNR environments, both with and without reverberation. Using 0.4-s audio segments, the proposed algorithm can achieve an average classification accuracy of 94.5% for the reverberant library and 95.1% for the nonreverberant library.

引用

页码：1025 / 1032

页数：8

共 17 条

[1]

Pitch-based feature extraction for audio classification [J].

Abu-El-Quran, AR ;

Goubran, RA .

2ND IEEE INTERNATIONAL WORKSHOP ON HAPTIC, AUDIO AND VISUAL ENVIRONMENTS AND THEIR APPLICATIONS - HAVE 2003, 2003, :43-47

[2]

Abu-El-Quran AR, 2005, INT CONF ACOUST SPEE, P305

[3]

IMAGE METHOD FOR EFFICIENTLY SIMULATING SMALL-ROOM ACOUSTICS [J].

ALLEN, JB ;

BERKLEY, DA .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 65 (04) :943-950

[4]

[Anonymous], P IEEE C INSTR MEAS

[5]

Content-based retrieval of music and audio [J].

Foote, JT .

MULTIMEDIA STORAGE AND ARCHIVING SYSTEMS II, 1997, 3229 :138-147

[6]

MULTILAYER FEEDFORWARD NETWORKS ARE UNIVERSAL APPROXIMATORS [J].

HORNIK, K ;

STINCHCOMBE, M ;

WHITE, H .

NEURAL NETWORKS, 1989, 2 (05) :359-366

[7]

Johnson D. H., 1993, ARRAY SIGNAL PROCESS, P112

[8]

Robust joint audio-video localization in video conferencing using reliability information [J].

Lo, D ;

Goubran, RA ;

Dansereau, RM ;

Thompson, G ;

Schulz, D .

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2004, 53 (04) :1132-1139

[9]

Lu GJ, 1998, ICSP '98: 1998 FOURTH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, PROCEEDINGS, VOLS I AND II, P1142, DOI 10.1109/ICOSP.1998.770818

[10]

SNR estimation of speech signals using subbands and fourth-order statistics [J].

Nemer, E ;

Goubran, R ;

Mahmoud, S .

IEEE SIGNAL PROCESSING LETTERS, 1999, 6 (07) :171-174

← 1 2 →