Emotional speech recognition: Resources, features, and methods

被引:542
作者
Ververidis, Dimitrios [1 ]
Kotropoulos, Constantine [1 ]
机构
[1] Aristotle Univ Thessaloniki, Artificial Intelligence & Informat Anal Lab, Dept Informat, Thessaloniki 54124, Greece
关键词
emotions; emotional speech data collections; emotional speech classification; stress; interfaces; acoustic features;
D O I
10.1016/j.specom.2006.04.003
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper we overview emotional speech recognition having in mind three goals. The first goal is to provide an up-to-date record of the available emotional speech data collections. The number of emotional states, the language, the number of speakers, and the kind of speech are briefly addressed. The second goal is to present the most frequent acoustic features used for emotional speech recognition and to assess how the emotion affects them. Typical features are the pitch, the formants, the vocal tract cross-section areas, the mel-frequency cepstral coefficients, the Teager energy operator-based features, the intensity of the speech signal, and the speech rate. The third goal is to review appropriate techniques in order to classify speech into emotional states. We examine separately classification techniques that exploit timing information from which that ignore it. Classification techniques based on hidden Markov models, artificial neural networks, linear discriminant analysis, k-nearest neighbors, support vector machines are reviewed. (c) 2006 Elsevier B.V. All rights reserved.
引用
收藏
页码:1162 / 1181
页数:20
相关论文
共 120 条
[51]  
JIANG DN, 2004, P INT C MULT EXP ICM
[52]   APPLICATION OF THE WAVELET TRANSFORM FOR PITCH DETECTION OF SPEECH SIGNALS [J].
KADAMBE, S ;
BOUDREAUXBARTELS, GF .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1992, 38 (02) :917-924
[53]  
KAWANAMI H, 2003, P EUR C SPEECH COMM, V4, P2401
[54]  
KWON OW, 2003, P EUR C SPEECH COMM, V1, P125
[55]   Toward detecting emotions in spoken dialogs [J].
Lee, CM ;
Narayanan, SS .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (02) :293-303
[56]   Expression of emotional-motivational connotations with a one-word utterance [J].
Leinonen, L ;
Hiltunen, T ;
Linnankoski, I ;
Laakso, ML .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1997, 102 (03) :1853-1863
[57]  
LIBERMAN M, 2005, LING DAT CONS
[58]   Conveyance of emotional connotations by a single word in English [J].
Linnankoski, I ;
Leinonen, L ;
Vihla, M ;
Laakso, ML ;
Carlson, S .
SPEECH COMMUNICATION, 2005, 45 (01) :27-39
[59]   Comprehension of prosody in Parkinson's disease [J].
Lloyd, AJ .
CORTEX, 1999, 35 (03) :389-402
[60]  
Makarova V., 2002, 7 INT C SPOK LANG PR, P2041