Classification of emotional speech using 3DEC hierarchical classifier

被引：28

作者：

Hassan, A. ^{[1
]}

Damper, R. I. ^{[1
]}

机构：

[1] Univ Southampton, Syst Res Grp, Sch Elect & Comp Sci, Southampton SO17 1BJ, Hants, England

来源：

SPEECH COMMUNICATION | 2012年 / 54卷 / 07期

关键词：

Speech processing; Emotion recognition; Valence-arousal model; Multiclass support vector machines; RECOGNITION; SIMULATION; FEATURES;

D O I：

10.1016/j.specom.2012.03.003

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The recognition of emotion from speech acoustics is an important problem in human machine interaction, with many potential applications. In this paper, we first compare four ways to extend binary support vector machines (SVMs) to multiclass classification for recognising emotions from speech-namely two standard SVM schemes (one-versus-one and one-versus-rest) and two other methods (DAG and UDT) that form a hierarchy of classifiers, each making a distinct binary decision about class membership. These are trained and tested using 6552 features per speech sample extracted from three databases of acted emotional speech (DES, Berlin and Serbian) and a database of spontaneous speech (FAU Aibo Emotion Corpus) using the OpenEAR toolkit. Analysis of the errors made by these classifiers leads us to apply non-metric multi-dimensional scaling (NMDS) to produce a compact (two-dimensional) representation of the data suitable for guiding the choice of decision hierarchy. This representation can be interpreted in terms of the well-known valence-arousal model of emotion. We find that this model does not give a particularly good fit to the data: although the arousal dimension can be identified easily, valence is not well represented in the transformed data. We describe a new hierarchical classification technique whose structure is based on NMDS, which we call Data-Driven Dimensional Emotion Classification (3DEC). This new method is compared with the best of the four classifiers studied earlier and a state-of-the-art classification method on all four databases. We find no significant difference between these three approaches with respect to speaker-dependent performance. However, for the much more interesting and important case of speaker-independent emotion classification, 3DEC significantly outperforms the competitors. (C) 2012 Elsevier B.V. All rights reserved.

引用

页码：903 / 916

页数：14

共 57 条

[1] [Anonymous], 2004, Proc. of the Speech Prosody
[2] [Anonymous], 1997, P 5 EUROPEAN C SPEEC, DOI DOI 10.21437/EUROSPEECH.1997-494
[3] [Anonymous], 2003, PRACTICAL GUIDE SUPP
[4] [Anonymous], 2009, THESIS U ERLANGEN NU
[5] [Anonymous], 2005, P ANN C INT SPEECH C
[6] Batliner A., 2004, PROC LREC, P171
[7] Private emotions versus social interaction:: a data-driven approach towards analysing emotion in speech
Batliner, Anton
Steidl, Stefan
Hacker, Christian
Noeth, Elmar
[J]. USER MODELING AND USER-ADAPTED INTERACTION, 2008, 18 (1-2) : 175 - 206
[8] Batliner Anton., 2006, Proc. IS-LTC 2006, P240
[9] Recognition of affective communicative intent in robot-directed speech
Breazeal, C
Aryananda, L
[J]. AUTONOMOUS ROBOTS, 2002, 12 (01) : 83 - 104
[10] Burkhardt F., 2005, INTERSPEECH, V5, P1517, DOI DOI 10.21437/INTERSPEECH.2005-446

← 1 2 3 4 5 6 →