Automatic discrimination between laughter and speech

被引：101

作者：

Truong, Khiet P. ^{[1
]}

van Leeuwen, David A. ^{[1
]}

机构：

[1] TNO HUman Factors, Dept Human Interfaces, NL-3769 ZG Soesterberg, Netherlands

来源：

SPEECH COMMUNICATION | 2007年 / 49卷 / 02期

关键词：

automatic detection laughter; automatic detection emotion;

D O I：

10.1016/j.specom.2007.01.001

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Emotions can be recognized by audible paralinguistic cues in speech. By detecting these paralinguistic cues that can consist of laughter, a trembling voice, coughs, changes in the intonation contour etc., information about the speaker's state and emotion can be revealed. This paper describes the development of a gender-independent laugh detector with the aim to enable automatic emotion recognition. Different types of features (spectral, prosodic) for laughter detection were investigated using different classification techniques (Gaussian Mixture Models, Support Vector Machines, Multi Layer Perceptron) often used in language and speaker recognition. Classification experiments were carried out with short pre-segmented speech and laughter segments extracted from the ICSI Meeting Recorder Corpus (with a mean duration of approximately 2 s). Equal error rates of around 3% were obtained when tested on speaker-independent speech data. We found that a fusion between classifiers based on Gaussian Mixture Models and classifiers based on Support Vector Machines increases discriminative power. We also found that a fusion between classifiers that use spectral features and classifiers that use prosodic information usually increases the performance for discrimination between laughter and speech. Our acoustic measurements showed differences between laughter and speech in mean pitch and in the ratio of the durations of unvoiced to voiced portions, which indicate that these prosodic features are indeed useful for discrimination between laughter and speech. (C) 2007 Published by Elsevier B.V.

引用

页码：144 / 158

页数：15

共 38 条

[21] JANIN A, 2004, NIST ICASSP 2004 M R
[22] Kennedy L., 2004, NIST ICASSP 2004 M R
[23] Lippmann R. P., 1993, Lincoln Laboratory Journal, V6, P249
[24] LOCKERD A, 2002, P CHI HUM FACT COMP, P574
[25] ANALYSIS OF 5 ACOUSTIC CORRELATES OF LAUGHTER
MOWRER, DE
LAPOINTE, LL
CASE, J
[J]. JOURNAL OF NONVERBAL BEHAVIOR, 1987, 11 (03) : 191 - 199
[26] Speech emotion recognition using hidden Markov models
Nwe, TL
Foo, SW
De Silva, LC
[J]. SPEECH COMMUNICATION, 2003, 41 (04) : 603 - 623
[27] VOCAL AFFECT IN 3-YEAR-OLDS - A QUANTITATIVE ACOUSTIC ANALYSIS OF CHILD LAUGHTER
NWOKAH, EE
DAVIES, P
ISLAM, A
HSU, HC
FOGEL, A
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1993, 94 (06) : 3076 - 3090
[28] Ohara R, 2004, THESIS NARA I SCI TE
[29] Oostdijk NHJ, 2000, P LREC 2000 ATH, V2, P887
[30] Speaker verification using adapted Gaussian mixture models
Reynolds, DA
Quatieri, TF
Dunn, RB
[J]. DIGITAL SIGNAL PROCESSING, 2000, 10 (1-3) : 19 - 41

← 1 2 3 4 →