Audio-based context recognition

被引:309
作者
Eronen, AJ [1 ]
Peltonen, VT
Tuomi, JT
Klapuri, AP
Fagerlund, S
Sorsa, T
Lorho, G
Huopaniemi, J
机构
[1] Nokia Res Ctr, FIN-33721 Tampere, Finland
[2] Nokia Mobile Phones, FIN-33721 Tampere, Finland
[3] Tampere Univ Technol, Inst Signal Proc, FIN-33101 Tampere, Finland
[4] Aalto Univ, Lab Acoust & Audio Gignal Proc, FIN-02015 Espoo, Finland
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2006年 / 14卷 / 01期
关键词
audio classification; context awareness; feature extraction; hidden Markov models (HMMs);
D O I
10.1109/TSA.2005.854103
中图分类号
O42 [声学];
学科分类号
070206 [声学]; 082403 [水声工程];
摘要
The aim of this paper is to investigate the feasibility of an audio-based context recognition system. Here, context recognition refers to the automatic classification of the context or an environment around a device. A system is developed and compared to the accuracy of human listeners in the same task. Particular emphasis is placed on the computational complexity of the methods, since the application is of particular interest in resource-constrained portable devices. Simplistic low-dimensional feature vectors are evaluated against more standard spectral features. Using discriminative training, competitive recognition accuracies are achieved with very low-order hidden Markov models (1-3 Gaussian components). Slight improvement in recognition accuracy is observed when linear data-driven feature transformations are applied to mel-cepstral features. The recognition rate of the system as a function of the test sequence length appears to converge only after about 30 to 60 s. Some degree of accuracy can be achieved even with less than 1-s test sequence lengths. The average reaction time of the human listeners was 14 s, i.e., somewhat smaller, but of the same order as that of the system. The average recognition accuracy of the system was 58% against 69%, obtained in the listening tests in recognizing between 24 everyday contexts. The accuracies in recognizing six high-level classes were 82% for the system and 88% for the subjects.
引用
收藏
页码:321 / 329
页数:9
相关论文
共 27 条
[1]
A discriminative training algorithm for hidden Markov models [J].
Ben-Yishai, A ;
Burshtein, D .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2004, 12 (03) :204-217
[2]
CASEY M, 2002, ORG SOUND, V6
[3]
CHEN G, 2000, TR2000381 DEPT COMP
[4]
CLARKSON B, 471 MIT MED LAB
[5]
Automatic classification of environmental noise events by hidden Markov models [J].
Couvreur, C ;
Fontaine, V ;
Gaunard, P ;
Mubikangiey, CG .
APPLIED ACOUSTICS, 1998, 54 (03) :187-206
[6]
COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].
DAVIS, SB ;
MERMELSTEIN, P .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366
[7]
ELMALEH K, 1999, P IEEE INT C AC SPEE, V1, P237
[8]
ERONEN A, 2002, P IEEE INT C AC SPEE, V5, P529
[9]
An overview of audio information retrieval [J].
Foote, J .
MULTIMEDIA SYSTEMS, 1999, 7 (01) :2-10
[10]
Gardner WG, 1994, 280 MIT MED LAB