Emotional speech recognition: Resources, features, and methods

被引:542
作者
Ververidis, Dimitrios [1 ]
Kotropoulos, Constantine [1 ]
机构
[1] Aristotle Univ Thessaloniki, Artificial Intelligence & Informat Anal Lab, Dept Informat, Thessaloniki 54124, Greece
关键词
emotions; emotional speech data collections; emotional speech classification; stress; interfaces; acoustic features;
D O I
10.1016/j.specom.2006.04.003
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper we overview emotional speech recognition having in mind three goals. The first goal is to provide an up-to-date record of the available emotional speech data collections. The number of emotional states, the language, the number of speakers, and the kind of speech are briefly addressed. The second goal is to present the most frequent acoustic features used for emotional speech recognition and to assess how the emotion affects them. Typical features are the pitch, the formants, the vocal tract cross-section areas, the mel-frequency cepstral coefficients, the Teager energy operator-based features, the intensity of the speech signal, and the speech rate. The third goal is to review appropriate techniques in order to classify speech into emotional states. We examine separately classification techniques that exploit timing information from which that ignore it. Classification techniques based on hidden Markov models, artificial neural networks, linear discriminant analysis, k-nearest neighbors, support vector machines are reviewed. (c) 2006 Elsevier B.V. All rights reserved.
引用
收藏
页码:1162 / 1181
页数:20
相关论文
共 120 条
[101]   BabyEars: A recognition system for affective vocalizations [J].
Slaney, M ;
McRoberts, G .
SPEECH COMMUNICATION, 2003, 39 (3-4) :367-384
[102]   NEW METHODS OF PITCH EXTRACTION [J].
SONDHI, MM .
IEEE TRANSACTIONS ON AUDIO AND ELECTROACOUSTICS, 1968, AU16 (02) :262-&
[103]   Speech under stress conditions: Overview of the effect on speech production and on system performance [J].
Steeneken, HJM ;
Hansen, JHL .
ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, :2079-2082
[104]  
STIBBARD R, 2000, P ISCA WORKSH SPEECH, V1, P60
[105]  
Tato R., 2002, INT C SPOKEN LANGUAG, P2029, DOI [10.21437/ICSLP.2002-557, DOI 10.21437/ICSLP.2002-557]
[106]  
TEAGER HM, 1990, EVIDENCE NONLINEAR S, V15
[107]   EFFECT OF EXPERIMENTALLY INDUCED STRESS ON VOCAL PARAMETERS [J].
TOLKMITT, FJ ;
SCHERER, KR .
JOURNAL OF EXPERIMENTAL PSYCHOLOGY-HUMAN PERCEPTION AND PERFORMANCE, 1986, 12 (03) :302-313
[108]  
van der Heijden F, 2004, CLASSIFICATION PARAM
[109]  
vBezooijen R., 1984, CHARACTERISTICS RECO
[110]  
Ververidis D, 2004, 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, P593